yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

Migrate 1944 OwP Objects from MS 2004 into UAT (PID 39) #2787

Open sshetenhelm opened 3 months ago

sshetenhelm commented 3 months ago

Story Migrate 1944 parent objects from the Kissinger MS 2004 collection (PID 39) from FindIt to DCS. Parents OIDs in this attached file: MS2004_ForMigration.csv

Acceptance

MaggieZhaoYale commented 1 month ago

Create parents object issues: failed to convert PTIFF due to Expected file originals xxxx not found.

MaggieZhaoYale commented 1 month ago

@sshetenhelm It looks an issue in UAT. I tried to create one in prod, it went through fine.

MaggieZhaoYale commented 4 weeks ago

162/1944 parents still have some expected original ptiff not found on S3, even though these images are in the pairetree. https://collections-uat.library.yale.edu/management/batch_processes/2004 4 parents are the complex parents, downloading images of their children's children ... 12481487 12481219 12479863 12479629

Three child images:Child Object 14775661 failed to convert PTIFF due to Conversion script exited with error code 1. --- Script exited with status 1 at line 54 Script exited with status 1 at line 58 --- (vips:14261): VIPS-WARNING **: 20:38:50.063: error in tile 0 x 2862 TIFFFillStrip: Read error at scanline 2860; got 8553 bytes, expected 21288 tiff2vips: read error

14775661 14740525 14759134

sshetenhelm commented 2 weeks ago

Looking at 14863855

In MGMT UAT, the Title in the JSON is

"title": [
    "Zo-Zu, Image 1”

Which is a child of parent 12479629, which was created in MGMT UAT but has 0 child images generated due to missing images. The child OID for this is 14863855 in MGMT UAT; however, in Ladybird, it is 17488697.

I will review more of these failed parents to see if other similar things are happening. However, I don’t know why the child OID is different from Ladybird.

sshetenhelm commented 2 weeks ago

Deleted 381 parents that were actually LB children - https://collections-uat.library.yale.edu/management/batch_processes/2036

MaggieZhaoYale commented 2 weeks ago

Looking at 14863855

In MGMT UAT, the Title in the JSON is

"title": [
    "Zo-Zu, Image 1”

Which is a child of parent 12479629, which was created in MGMT UAT but has 0 child images generated due to missing images. The child OID for this is 14863855 in MGMT UAT; however, in Ladybird, it is 17488697.

I will review more of these failed parents to see if other similar things are happening. However, I don’t know why the child OID is different from Ladybird.

The parent 12479629 has 0 child images, its 74 children are archivalDigitized, and all have images. It looks that all oids of the 74 children in MGMT are different from the ladybird. The children of this parent in ladybird uat have different oids from the prod ladybird. https://metadata-api-uat.library.yale.edu/metadatacloud/api/1.0.1/ladybird/oid/12479629?include-children=1&mediaType=json VS https://metadata-api.library.yale.edu/metadatacloud/api/1.0.1/ladybird/oid/12479629?include-children=1&mediaType=json

sshetenhelm commented 1 week ago

So, if I am understanding correctly, this means that the object is structured differently in UAT Ladybird than Prod Ladybird?

In Prod, I see: Parent - 12479629 Child 1 - 17488697 - image Child 2 - 17488698 - image etc.

I don't have permissions for Ladybird UAT, although the record looks the same in FindIt UAT vs. FindIt PROD.

My inclination is to not let this single parent be a blocker for migrating the rest of the collection, so perhaps we should just roll with this one as it stands, and try it in PROD, then troubleshoot with Josh from there if it doesn't work.

sshetenhelm commented 1 week ago

The following parents have 0 child images: 12479629 (the problematic Zo-Zu parent mentioned above) 12479863 - EDIT: should have images 12481219 EDIT: Hierarchical; will delete from UAT 12481487 - EDIT: should have images

The following parents are missing 1 child image: 12482030 - missing child order 57, oid 14775661 12482044 - missing child order 27, oid 14740525 12482081 - missing child order 140, oid 14759134

I will check Ladybird to see if images were attached to these in Ladybird/FindIt.

EDIT: The three missing children should all have tifs/jpgs attached as per Ladybird, but child 14740525's tif filesize is only 3152 kb, which is much smaller than the others.