yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

Sync Temporary Goobi Ingest objects with Preservica #2465

Open sshetenhelm opened 1 year ago

sshetenhelm commented 1 year ago

Follow up to #2426

All objects uploaded with the Goobi Temporary Ingests need to be synced to their files in Preservica. A spreadsheet with these is here https://docs.google.com/spreadsheets/d/1iqsayDnZz_ur8dH5iUHGap9kDfQkn2vMJ1dQms0hXBc/edit#gid=1783466688

Some materials are not yet ingested into Preservica, so these will need to be ingested into Preservica before we can reassociate in DCS. These are

The following are already in Preservica and could be reassociated now

motropuk commented 1 year ago

We should bring this into sprint, as we working on this. I have rewritten this ticket as Law and Medical objects are being ingested into Preservica right now by DPS. The Music and MSSA items are already in Preservica, so could be reassociated now. I would be happy to take this on.

motropuk commented 1 year ago

Music and MSSA items have now been updated with Preservica info in Prod. Next steps are to get the Law and Medical content into Preservica and connected up

motropuk commented 1 year ago

Music batch is actually taking a while, so need to keep an eye on this https://collections.library.yale.edu/management/batch_processes/13334

motropuk commented 1 year ago

Music update batch process stalled at OID 32205563. As this parent failed, it seems as though the batch itself then stalled. There were 88 parents on the manifest to update. It got to number 25 (which failed) and then didnt process any more. Should discuss if this is intended functionality or not. If so, needs more status reporting for the job to make it clear one parent failed and that the rest of the batch was therefore not processed.

Batch: https://collections.library.yale.edu/management/batch_processes/13334

Remaining parents to update

Music_parent_preservica_update.csv

Will run an updated manifest with the remaining parents to update.

motropuk commented 1 year ago

Update on above, looks like the above failed due to issues in Preservica with the API availability. DPS are investigating a fix. Even though the first 24 say they completed, when you go into the parents, it looks as if the SOLR records have not regenerated, or PDFs. So fix is probably to fix the Preservica API issues, and then run the whole parent update job for Music again

Music_parent_preservica_update.csv

DraxIndustries79 commented 1 year ago

Waiting for previous ticket to close

motropuk commented 1 year ago

Issue with https://collections.library.yale.edu/management/parent_objects/10022080 but different to the one before.

This was a parent I tried to update previously and it brought in the new children from Preservica but also left the existing children.

Tried a straight resync and it did nothing https://collections.library.yale.edu/management/batch_processes/13493. Said that DCS matched Preservica?

So then I cleared out all of the existing children for this parent, so the parent was left with 0 children. Then ran a resync with Preservica.

This brought in the children, but their sort order does not match Preservica

DCS Image

Preservica https://github.com/yalelibrary/YUL-DC/assets/13023486/0b32879b-0153-451e-b80f-8f83c00b02c0

Parent is still not displaying in Blacklight but I dont know if this is because jobs are still run https://collections.library.yale.edu/catalog/10022080. There is a large backlog of jobs (mainly PDF jobs) which might be holding this up

motropuk commented 1 year ago

Testing script and batch process CSVs to test the Preservica issues in DCS UAT - for @K8Sewell

Testing script for DCSPreservica Issues - 08162023.pdf create_parent_objects_preservicatesting.csv update_parent_objects_preservicatesting.csv

K8Sewell commented 1 year ago

Getting some preservica errors (https://collections-uat.library.yale.edu/management/batch_processes/1504) but still investigating. Will retry the process and see if I can discern what is hanging us up.

K8Sewell commented 1 year ago

Still working through some preservica issues so putting this down for a little bit while preservica comes back online. Resolved but still working on confirming parity with objects in test preservica instance.

Testing Results Create Parent Script - Successful match with parent objects 900148858 to test preservica folder 13527050_39002126219543 and 900149766 to folder 13527069_39002126219543

Update Parent Script - Failed - kept old child objects instead of removing them. Will craft some tests that should reveal why they are not being removed as expected.

K8Sewell commented 1 year ago

PR ready for review - https://github.com/yalelibrary/yul-dc-management/pull/1247

K8Sewell commented 1 year ago

Deployed to Test with release v2.63.1 but will need deployed to UAT for testing.

K8Sewell commented 1 year ago

Not the result I expected. This should have found the old child records and cleared them out. Taking back to in progress.

Image

K8Sewell commented 1 year ago

PR ready for review https://github.com/yalelibrary/yul-dc-management/pull/1251 It's not elegant but it will get us past the issue we had with the last attempt.

K8Sewell commented 1 year ago

Deployed to UAT for testing with release v2.63.2

K8Sewell commented 1 year ago

Failing for a checksum mismatch now so taking back to in progress

Image

K8Sewell commented 1 year ago

I think the issue is fixed. While there was an error raised because of a checksum mismatch the parent object 900124050 now matches with the 46 child objects in Test Presevica for structural object ...76868 and they appear to be in the correct order as well. I'm currently testing the other parent object 900099833 up for update testing. The before screenshot below shows both the old and the preservica child objects but hopefully once this object has processed (waiting on a few delayed jobs) we will see only the expected 54 child objects for structural object ...babeb.

Before

Image

K8Sewell commented 1 year ago

I'm not 100% sure but I think I need to wait for issue with test preservica to resolve before I can test this. It's skipping the import due to a timeout. Right now I'm not able to login to the test preservica instance on Firefox or Chrome.

Image

motropuk commented 1 year ago

@K8Sewell I have reported the Preservica Test outage to our digital preservation folks. They will work on a fix

K8Sewell commented 1 year ago

Looks like we are in a good place again. The last parent object to update has the correct 54 children now (instead of 108).

Image

Image

sshetenhelm commented 1 year ago

Need to roll work into PROD but still keep ticket for others things. Can split out.

sshetenhelm commented 1 year ago

Spawning jobs again.

motropuk commented 1 year ago

I tried to resync this object again in Production and it is still not working as expected. The notable issue here is the sort order is still wrong https://collections.library.yale.edu/management/parent_objects/10022080. Additionally the parent will not display in Blacklight. Is this possibly just an issue with this parent we need to fix?

K8Sewell commented 1 year ago

Can we change the Bitstream filename over in Preservica? That's how the caption and ordering are created and thus what is throwing the sort order off. I am unable to find the matching record in Preservica test so if anyone has a link to that - would be greatly appreciated. I'd like to confirm the bitstream filename matches what is in Preservica for parent object 10022080 and I'd like to change it from _1 to _01 so that it captures the correct ordering and try updating the parent to confirm the sort order gets fixed. In the meantime I can draft up some logic that will adjust the filename to avoid this order issue but it feels a little bit of overkill now that I have an idea why the sort order was incorrect.

K8Sewell commented 1 year ago

11 days of work as of 9.18.

As per David, "so it looks like the API returns as lexiographic sorting, which is alphabetic sorting of the numbers instead of numerically"

Will break adjusting the sorting we interpret from the API into another ticket #2621

DraxIndustries79 commented 11 months ago

Waiting for sync issue to be resolved

sshetenhelm commented 10 months ago

Just a note to say that Medical objects are now in Preservica.

I may try to resync one or two objects, to see if we have a similar issue as the one MUS object we are trying to resolve.

sshetenhelm commented 10 months ago

Attempted to sync parent 32320833 with Preservica files. Received 'Unable to login' error in DCS. Confirmed with Digital Preservation that the object has the correct security tag in Preservica (one that the DCS user s_dcs_medical should have access to), and that the correct structure and representation type were added on the spreadsheet. Will likely need to investigate on our end, as the Preservica stuff should be fine.

https://collections.library.yale.edu/management/batch_processes/14323

sshetenhelm commented 10 months ago

Still experience login issue with Medical.

sshetenhelm commented 9 months ago

Login issue with Medical fixed, will start work on these again.

sshetenhelm commented 9 months ago

For Medical, tried to update parent 32320833

Received the following error: Parent OID: 32320833 because of Request error 404 <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Error><ExtendedMessage>No Information Object with ref but there another type of entity with the ref</ExtendedMessage><MessageKey>entity.does.not.exist</MessageKey></Error> for /structural-objects/426e5201-1d21-4b44-8b01-0a660383ee59

As you can see from the error, I put in "Structural" and not "Information", so I don't understand why it's telling me there is no Information object with that ref but a different type of entity with ref.

In Preservica, the object is a Structural Object with a Preservation representation type. As such, it seems like this is a DCS issue. Could we please investigate?

sshetenhelm commented 9 months ago

Created ticket #2691 for the above issue.

sshetenhelm commented 8 months ago

@motropuk Do you remember if your child OIDs were retained when you synced your Goobi objects, or were they replaced?

Today, I:

Used ‘Update Parent Objects’ batch process to add Preservica information to Parent 32329442, which had one child object (32329443).

Parent now has two child objects with the following oids: 33093723 33093724

The caption for both parents is32329443.tif.

The old child, 32329443, appears to have been deleted; it is no longer in the Child Objects data table.

I resynced with Preservica, and it retained both two new children with the new OIDs. The folder in Preservica located at the assigned UUID only has one image.

We should not add Preservica information for any more Medical objects until we can confirm that (a) the original child OID can be retained and (b) the correct number of children are created for each object. These issues might be solved with #2510 ?

motropuk commented 8 months ago

@sshetenhelm good question. I honestly I cant remember, or at least dont know if I checked. I think for the Music objects I was not too concerned if they child oids were updated, so didnt pay enough attention to that.

sshetenhelm commented 8 months ago

A sample of Preservica-reassociated Music objects have:

Affected parents include: https://collections.library.yale.edu/catalog/32204414 https://collections.library.yale.edu/catalog/32204693 https://collections.library.yale.edu/catalog/32202808

We should put this ticket back on hold until these issues are rectified. Will push back to backlog and pull in #2510 to fix errors (#2510 includes specifications to retain child OIDs and remove existing child objects without Preservica info).

sshetenhelm commented 8 months ago

The alternative being whether or not we are comfortable creating new child OIDs for all objects, and then manually deleting the prior "double" images.

motropuk commented 8 months ago

@motropuk for the music objects, it is fine to delete the double children, leaving just the preservica ingested images, just to clean those parents up. But otherwise, sounds like working on #2510 first is the best way forward

sshetenhelm commented 8 months ago

Created #2703 for cleaning up parents