Open sshetenhelm opened 8 months ago
Create parents object issues: failed to convert PTIFF due to Expected file originals xxxx not found.
@sshetenhelm It looks an issue in UAT. I tried to create one in prod, it went through fine.
162/1944 parents still have some expected original ptiff not found on S3, even though these images are in the pairetree. https://collections-uat.library.yale.edu/management/batch_processes/2004 4 parents are the complex parents, downloading images of their children's children ... 12481487 12481219 12479863 12479629
Three child images:Child Object 14775661 failed to convert PTIFF due to Conversion script exited with error code 1. --- Script exited with status 1 at line 54 Script exited with status 1 at line 58 --- (vips:14261): VIPS-WARNING **: 20:38:50.063: error in tile 0 x 2862 TIFFFillStrip: Read error at scanline 2860; got 8553 bytes, expected 21288 tiff2vips: read error
14775661 14740525 14759134
Looking at 14863855
In MGMT UAT, the Title in the JSON is
"title": [
"Zo-Zu, Image 1”
Which is a child of parent 12479629, which was created in MGMT UAT but has 0 child images generated due to missing images. The child OID for this is 14863855 in MGMT UAT; however, in Ladybird, it is 17488697.
I will review more of these failed parents to see if other similar things are happening. However, I don’t know why the child OID is different from Ladybird.
Deleted 381 parents that were actually LB children - https://collections-uat.library.yale.edu/management/batch_processes/2036
Looking at 14863855
In MGMT UAT, the Title in the JSON is
"title": [ "Zo-Zu, Image 1”
Which is a child of parent 12479629, which was created in MGMT UAT but has 0 child images generated due to missing images. The child OID for this is 14863855 in MGMT UAT; however, in Ladybird, it is 17488697.
I will review more of these failed parents to see if other similar things are happening. However, I don’t know why the child OID is different from Ladybird.
The parent 12479629 has 0 child images, its 74 children are archivalDigitized, and all have images. It looks that all oids of the 74 children in MGMT are different from the ladybird. The children of this parent in ladybird uat have different oids from the prod ladybird. https://metadata-api-uat.library.yale.edu/metadatacloud/api/1.0.1/ladybird/oid/12479629?include-children=1&mediaType=json VS https://metadata-api.library.yale.edu/metadatacloud/api/1.0.1/ladybird/oid/12479629?include-children=1&mediaType=json
So, if I am understanding correctly, this means that the object is structured differently in UAT Ladybird than Prod Ladybird?
In Prod, I see: Parent - 12479629 Child 1 - 17488697 - image Child 2 - 17488698 - image etc.
I don't have permissions for Ladybird UAT, although the record looks the same in FindIt UAT vs. FindIt PROD.
My inclination is to not let this single parent be a blocker for migrating the rest of the collection, so perhaps we should just roll with this one as it stands, and try it in PROD, then troubleshoot with Josh from there if it doesn't work.
The following parents have 0 child images:
12479629 (the problematic Zo-Zu parent mentioned above)
12479863 - EDIT: should have images
12481219 EDIT: Hierarchical; will delete from UAT
12481487 - EDIT: should have images
The following parents are missing 1 child image: 12482030 - missing child order 57, oid 14775661 12482044 - missing child order 27, oid 14740525 12482081 - missing child order 140, oid 14759134
I will check Ladybird to see if images were attached to these in Ladybird/FindIt.
EDIT: The three missing children should all have tifs/jpgs attached as per Ladybird, but child 14740525's tif filesize is only 3152 kb, which is much smaller than the others.
ONE: @MaggieZhaoYale could you please try and pull in the following three children? 12482030 - missing child order 57, oid 14775661 12482044 - missing child order 27, oid 14740525 12482081 - missing child order 140, oid 14759134
If they won't migrate, I'll see if we can get original images from elsewhere.
TWO: We will address the three parents with 0 images in UAT when we migrate to PROD, due to discrepancies between OIDs across Ladybird UAT and PROD.
THREE: @MaggieZhaoYale , could we start pulling in text files for these objects, when you have a chance? Thank you!
@sshetenhelm Although the images of the above 3 can be dowloaded from Fedora, all have conversion issue. e.g. https://collections-uat.library.yale.edu/management/batch_processes/2081/parent_objects/12482081/child_objects/14759134
Pulled three child images from Preservica--they are also having file issues. Have asked Josh if he knows of any other copies of these scans. We may need to bookmark these for solving in PROD later, especially if we need to have the pages re-scanned.
EDIT: Asking IT for access to full Kissinger storage share, to see if images work there.
EDIT 2: These three images are also broken in the Kissinger Storage @ Yale share, so we will need to request to have the pages re-digitized. I will make a backlog ticket to address these issues once the pages have been re-digitized, and we will move forward with the migration process with that in mind.
Working through OCR for partial text objects.
@MaggieZhaoYale Could you please re-try migrating the text files for these parents? Thank you!
After comparing UAT to Aspace, it looks like we have 73 outstanding MS 2004 objects that were not migrated. 1 is OwP (12482156), 72 are public.
csv: MS2004-Additions.csv
Outstanding actions:
https://collections-uat.library.yale.edu/management/batch_processes/2202 please ignore 140 failed parents, which were deleted. No ocr was found for the 73 parents.
@MaggieZhaoYale First round of replacement OCR .txt files (~10,714) available at: FC_YULDCS-807001-YUL > DCS_MaggieandSummer > Kissinger_OCR > MS2004-01
Next files: FC_YULDCS-807001-YUL > DCS_MaggieandSummer > Kissinger_OCR > MS2004-02
Thank you @MaggieZhaoYale ! MS2004-03 is uploaded in the same place, and MS2004-04 should be transferred into our share by the end of day (Mon 10/14)
Thank you @MaggieZhaoYale ! Here are the next two batches; the folders are in the same location on the shared drive:
Thank you @MaggieZhaoYale !!!
We are down to 10 OwP objects that have 'None' for OCR. Not sure how I missed those but I will wrangle the files.
~ In theory ~, all the partials should be in FC_YULDCS-807001-YUL > DCS_MaggieandSummer > Kissinger_OCR > MS2004-Partials
Once we do the partials, I'll look at the entire collection again and provide both the missing Text = None and any remaining Text = Partial
Down to: 10 OwP objects with Full Text = None 4 OwP objects with Full Text = Partial
Gathering files now.
@MaggieZhaoYale - The final OCR files are in FC_YULDCS-807001-YUL > DCS_MaggieandSummer > Kissinger_OCR > MS2004-FINAL
Could you please upload them when you have a chance?
Totals
Discrepancy of 8 parents due to the following: Corrupted image files on server; requesting/have requested rescans 12479779 - Missing child order 143, oid 15035394 12482030 - Missing child order 57, oid 14775661 12482044 - Missing child order 27, oid 14740525 12482081 - Missing child order 140, oid 14759134
Missing all images; Different OIDs in Ladybird UAT vs PROD 12479629 - Box 115 | Folder 7 12479863 - Box 145 | Folder 5 12481487 - Box 213 | Folder 9
Missing some images 12482156 - Box 35 Folder 2 - Missing images for children 74 and 83 through 93
I will pass collection off to stakeholder to review and ask if there is any reason why we can't provide the images for the last four parents (ex. are images missing for a reason?)
Waiting for feedback from others
Starting the first week of December, we will begin moving this into PROD, but under "PRIVATE" until January.
Story Migrate 1944 parent objects from the Kissinger MS 2004 collection (PID 39) from FindIt to DCS. Parents OIDs in this attached file: MS2004_ForMigration.csv
Acceptance