yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

Ingest Yale Papyrus Collection to DCS #2121

Closed sshetenhelm closed 5 months ago

sshetenhelm commented 2 years ago

Story The Yale Papyrus Collection (PID 42) is currently in FindIt and needs to move to DCS. However, will use this collection as a test of the new Preservica DCS integration. Will ingest the collection into Preservica Test and DCS UAT first. Then when Preservica V6 is in Prod, will ingest final collection there. BRBL have said that Original parent OIDs DON'T need to be retained from Findit, so this is not longer a migration, but instead a new collection ingest using Preservica

Acceptance

motropuk commented 2 years ago

Have written to BRBL to see if the Papryi data in Preservica Test is complete https://yul-pres-tsdb.library.yale.edu/explorer/explorer.html#browse:SO&ecd02c4b-21a9-4628-80e5-f257ede28f97&null and whether the ASpace handles are now in Test and or Prod

motropuk commented 2 years ago

David and Mark are working on getting this collection into ASpace TEST and Preservica TEST so we can add it to DCS UAT.

motropuk commented 1 year ago

@sshetenhelm did we want to pull this into Sprint at some point, as this work is now ongoing?

sshetenhelm commented 1 year ago

Yes, I can go ahead and slide it in.

sshetenhelm commented 1 year ago

@ucancallmealicia is the papyri metadata in test Aspace? Or is that still in progress?

ucancallmealicia commented 1 year ago

Yes it's in test in the BRBL repo: https://testarchivesspace.library.yale.edu/resources/5894

sshetenhelm commented 1 year ago

DPS is still doing testing in Preservica re: the collection hierarchy.

sshetenhelm commented 11 months ago

TEST resync is complete.

sshetenhelm commented 11 months ago

@ucancallmealicia Aspace is saying YPC has an unpublished ancestor. Is that why I can't seem to find it on test Archives at Yale?

ucancallmealicia commented 11 months ago

@sshetenhelm the resource record was unpublished, so yes that's why you were getting that message. I just published it so should show up in a few.

sshetenhelm commented 11 months ago

Papyri object number 1 is in! https://collections-uat.library.yale.edu/catalog/901602829

I'll start working on the others :)

sshetenhelm commented 11 months ago

Currently adding Fixity to entire collection in order to ingest in bulk.

sshetenhelm commented 11 months ago

~~Using information from ArchivesSpace report, created 6894 parents in MGMT UAT. 23 parents did not pull children -- will investigate~~

PID in Ladybird only has 6462 parents, while FindIt shows 6553 total parents.

@motropuk and @ucancallmealicia, do either of you know a definitive amount of parent objects we should have?

sshetenhelm commented 10 months ago

For objects that had zero children in MGMT, folder in Preservica appeared to have no images. Still investigating.

sshetenhelm commented 10 months ago

Based on spreadsheet papyri-2018-excel-master, was able to match and remove all parents attached to records of objects without scans.

Now have 6319 parents in Management, which is still a discrepancy with Ladybird PID 42 (6462 - 6319 = 143). Possible some LB records hierarchical? Will continue investigating.

sshetenhelm commented 10 months ago

141 objects are in FindIt but do not have TIFFs, as per legacy spreadsheet. These objects are: YPC_NoTIFFs.csv

11 objects are in FindIt but did not bring children into DCS, but should have TIFFs, as per legacy spreadsheet. These objects are: YPC_MissingScans.csv

sshetenhelm commented 10 months ago

Ingested 9 of 11 outstanding objects, thanks David for helping us locate!

In Management - 6,328 parents In Blacklight - 6,328 parents

Still investigating P.CtYBR inv. 4717(B) & P.CtYBR inv. 3679(B)

6914 Records on Mark Custer’s Spreadsheet 587 should have zero images

6914 - 587 = 6327

6328 in MGMT + 2 outstanding = 6330

Not sure how we are 3 over, but I will investigate.

sshetenhelm commented 10 months ago

Identified 5 duplicates of parents in UAT (same aspace URI, same Preservica info). Deleted. Now 6,323 parents in MGMT UAT.

6,323 + 2 still under investigation = 6,325

Now we are two objects under legacy list

Will continue to investigate

@ucancallmealicia would it be possible to move Papyri records into PROD aspace, if this has not been done already?

sshetenhelm commented 10 months ago
sshetenhelm commented 9 months ago

Update from DPS - Aspace link workflow is working, however, items not moving automatically to BRBL Preservica folder. Putting down temporarily while people are out of office, will pick back up week of 19th.

sshetenhelm commented 7 months ago

Pivoting back to this. Records are being published to Aspace as we speak, will attempt first ingest this afternoon.

sshetenhelm commented 7 months ago

Current 3,367 available in Blacklight PROD.

sshetenhelm commented 6 months ago

Current stats: 5070 in Blacklight 5646 in MGMT

stilll working

sshetenhelm commented 6 months ago

Up to 5922 in Blacklight, still working through the last few hundred

sshetenhelm commented 6 months ago

6,087 YPC parents in Blacklight

6,716 YPC parents in MGMT 625 have 0 children

6716-625 = 6091, which means 4 parents have children but are not displaying in Blacklight

Will continue to investigate.

sshetenhelm commented 6 months ago

Goal is 6,323 in Blacklight Currently 6,136 Difference of 187 objects

Still investigating.

sshetenhelm commented 6 months ago

Currently 6144 objects in Blacklight. Identified 104 objects that are in Preservica/Aspace TEST but not in Preservica/Aspace PROD. Working on reconciling.

sshetenhelm commented 6 months ago

6,253 in Blacklight

Batch Process 16838 has the following failures -

36 objects:

31 objects:

3 objects:

sshetenhelm commented 6 months ago

For the first three I tried in YPC-Failed-Broken, when I tried to right-click > Download on these objects, I received the following error message for at least one TIF in each folder:

This page isn’t working preservica.library.yale.edu is currently unable to handle this request.
HTTP ERROR 500

The three YPC-0Children-Broken objects I tried all worked though.

sshetenhelm commented 6 months ago

YPC-Failed-Broken needed a fix in Preservica; our test worked, so just waiting for the green light to ingest the rest of that batch.

For YPC-0Children-Broken, From David:

_so far the first 4-5 that I've checked from YPC-0Children-Broken.csv don't have any download events from the s_dcsbrbl account in their histories, so for those it looks like the request isn't making it to the download level

decirella commented 6 months ago

@sshetenhelm the items on YPC-Failed-Broken are ready to be re-tried.

The cause of these failures were stale file handles being presented by the storage product to Preservica, we were able to manually refresh these stale file handles. It's not clear yet why these specific files were afflicted, we are working towards a general solution.

sshetenhelm commented 6 months ago

YPC-Failed-Broken worked, but had to resync a few to get them going. Now 6,284 in PROD.

sshetenhelm commented 6 months ago

Successfully ingested the unpublished Aspace parents. Now 6,288 in Prod

The only outstanding objects are the 36 that refuse to create child objects. Will continue to investigate.

sshetenhelm commented 6 months ago

Case Study: Parent OID 33203833

Jobs for object: Image

Is there any way to look up Management Production jobs 13993956 and 13993957 to see if something went wonky there?

sshetenhelm commented 5 months ago

Did one more test today, having same issue. Created #2826 to address remaining 36 objects and will close this ticket. Problem described in previous comment is available in #2824 for troubleshooting purposes.