yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

Migrate Garvin City Planning Image Collection #1807

Closed motropuk closed 2 years ago

motropuk commented 2 years ago

Story

Migration the Garvin City Planning Image Collection (PID 65) from Findit into DCS. Only migrate objects with images attached

Acceptance

MaggieZhaoYale commented 2 years ago

over 250,000 images. Downloaded 9999 each run.

MaggieZhaoYale commented 2 years ago

Some images are not in Fedora and ladybird See the attached csv: https://app.zenhub.com/files/242185115/247d7100-97e1-4a20-8ede-6ab06625f5c3/download

motropuk commented 2 years ago

@MaggieZhaoYale Spotted some issues during auditing on UAT. Parent 16195742 does not have any child objects attached. Additionally, while there are 16557 objects in DCS UAT management app, only 15364 are displaying in blacklight https://collections-uat.library.yale.edu/catalog?f%5bproject_identifier_tesi%5d%5b%5d=65&q=&search_field=all_fields. 1 parent (16195742) is Yale Community Only so that would account for 1 parent not displaying. But that still leaves 1192 parents that are not displaying in Blacklight

MaggieZhaoYale commented 2 years ago

Reprocess the failed parents. 887 parents failed. Among them, 166 are the parents without images. Some failed parents have children which only have jpg images. e.g. digcoll:4228020. Shall we use the jpg images and converted them to tif? @motropuk The list of parent without images in the failed parents. https://app.zenhub.com/files/242185115/01c945e3-3b9f-4dde-934e-c79057111ba7/download

motropuk commented 2 years ago

@MaggieZhaoYale Yes lets convert the ones with jpgs only to TIFF and then lets see what we have left after that.

motropuk commented 2 years ago

Still only 15,660 objects out of the 16559 in UAT management, displaying in UAT blacklight https://collections-uat.library.yale.edu/catalog?f%5bproject_identifier_tesi%5d%5b%5d=65&q=&search_field=all_fields. @MaggieZhaoYale is it possible to get a full list of the parents that are not displaying on the front end?

MaggieZhaoYale commented 2 years ago

Failed to convert the jpg to tif using imageMagik. Still 15660/16559 in blacklight. Among the 899 problematic parents, 166 are without images

MaggieZhaoYale commented 2 years ago

@sshetenhelm This is the list of parent oid without children. The list includes parent which was deleted. The query was based on the children status, so it includes parent which has status 'delete', e.g. 16191341. I marked as 'delete' The majority is the parent without children. e.g. https://findit.library.yale.edu/catalog/digcoll:4215161 parent oid:16191626 (What I said there was 166+ 179 parent without image during the meeting was incorrect. There were some duplicated entries. Total 231 without images and the number includes deleted parents in fedora, and 3 parents cannot get children information from ladybird via metadata cloud: 16195719, 16195742,16195324. The 3 are in findit and their child images are in the pairtree. These parents are the lowest level parent to the images. https://app.zenhub.com/files/242185115/4a8c5c26-5f34-4866-8d38-8a398bb21b11/download

57 images of 5 parents were deleted/inactive https://app.zenhub.com/files/242185115/8b70b303-90d5-4119-b42c-7437ecc84df6/download

motropuk commented 2 years ago

@MaggieZhaoYale We are going to take the same approach with this collection as we did with Day Missions. Lets move everything to Production and then we will audit and fix issues there. Checked in with Arts and they are happy with this approach

MaggieZhaoYale commented 2 years ago

@motropuk @sshetenhelm 851 parent failed in the prod. Among 851 failed parent, 234 are inactive parents or deleted parents or parent without images. Here is the list of 234 parents: https://app.zenhub.com/files/242185115/3e928933-2f39-4424-a643-508362440e13/download The causes of 234 deleted/inactive parents or parent without images: Fedora query the active children with viewopt_ssi archivalDigitized, then get their parents. So parent could be deleted or inactive in the fedora. some parent without children came from ladybird information.

Others are parents with missing images. Except couple of the parents have one image could not convert into ptiff (images issue), most of them of them are caused by: findit has less children than that in the ladybird. e.g. 16190193 has 19 images, ladybird has 22 images. Here is the list of the parents with missing/problematic images: https://app.zenhub.com/files/242185115/04a0b97c-7d66-46e9-adc6-37dc0eaa863e/download

motropuk commented 2 years ago

Thanks @MaggieZhaoYale. I will link to your note above in our review ticket for Garvin (#1929). So all remaining work for Garvin is in #1929. This ticket can be closed. Confirmed that 16559 objects are in Production management app.