Open sshetenhelm opened 1 year ago
We should bring this into sprint, as we working on this. I have rewritten this ticket as Law and Medical objects are being ingested into Preservica right now by DPS. The Music and MSSA items are already in Preservica, so could be reassociated now. I would be happy to take this on.
Music and MSSA items have now been updated with Preservica info in Prod. Next steps are to get the Law and Medical content into Preservica and connected up
Music batch is actually taking a while, so need to keep an eye on this https://collections.library.yale.edu/management/batch_processes/13334
Music update batch process stalled at OID 32205563. As this parent failed, it seems as though the batch itself then stalled. There were 88 parents on the manifest to update. It got to number 25 (which failed) and then didnt process any more. Should discuss if this is intended functionality or not. If so, needs more status reporting for the job to make it clear one parent failed and that the rest of the batch was therefore not processed.
Batch: https://collections.library.yale.edu/management/batch_processes/13334
Remaining parents to update
Music_parent_preservica_update.csv
Will run an updated manifest with the remaining parents to update.
Update on above, looks like the above failed due to issues in Preservica with the API availability. DPS are investigating a fix. Even though the first 24 say they completed, when you go into the parents, it looks as if the SOLR records have not regenerated, or PDFs. So fix is probably to fix the Preservica API issues, and then run the whole parent update job for Music again
Waiting for previous ticket to close
Issue with https://collections.library.yale.edu/management/parent_objects/10022080 but different to the one before.
This was a parent I tried to update previously and it brought in the new children from Preservica but also left the existing children.
Tried a straight resync and it did nothing https://collections.library.yale.edu/management/batch_processes/13493. Said that DCS matched Preservica?
So then I cleared out all of the existing children for this parent, so the parent was left with 0 children. Then ran a resync with Preservica.
This brought in the children, but their sort order does not match Preservica
DCS
Preservica https://github.com/yalelibrary/YUL-DC/assets/13023486/0b32879b-0153-451e-b80f-8f83c00b02c0
Parent is still not displaying in Blacklight but I dont know if this is because jobs are still run https://collections.library.yale.edu/catalog/10022080. There is a large backlog of jobs (mainly PDF jobs) which might be holding this up
Testing script and batch process CSVs to test the Preservica issues in DCS UAT - for @K8Sewell
Testing script for DCSPreservica Issues - 08162023.pdf create_parent_objects_preservicatesting.csv update_parent_objects_preservicatesting.csv
Getting some preservica errors (https://collections-uat.library.yale.edu/management/batch_processes/1504) but still investigating. Will retry the process and see if I can discern what is hanging us up.
Still working through some preservica issues so putting this down for a little bit while preservica comes back online. Resolved but still working on confirming parity with objects in test preservica instance.
Testing Results Create Parent Script - Successful match with parent objects 900148858 to test preservica folder 13527050_39002126219543 and 900149766 to folder 13527069_39002126219543
Update Parent Script - Failed - kept old child objects instead of removing them. Will craft some tests that should reveal why they are not being removed as expected.
PR ready for review - https://github.com/yalelibrary/yul-dc-management/pull/1247
Deployed to Test with release v2.63.1 but will need deployed to UAT for testing.
Not the result I expected. This should have found the old child records and cleared them out. Taking back to in progress.
PR ready for review https://github.com/yalelibrary/yul-dc-management/pull/1251 It's not elegant but it will get us past the issue we had with the last attempt.
Failing for a checksum mismatch now so taking back to in progress
I think the issue is fixed. While there was an error raised because of a checksum mismatch the parent object 900124050 now matches with the 46 child objects in Test Presevica for structural object ...76868 and they appear to be in the correct order as well. I'm currently testing the other parent object 900099833 up for update testing. The before screenshot below shows both the old and the preservica child objects but hopefully once this object has processed (waiting on a few delayed jobs) we will see only the expected 54 child objects for structural object ...babeb.
I'm not 100% sure but I think I need to wait for issue with test preservica to resolve before I can test this. It's skipping the import due to a timeout. Right now I'm not able to login to the test preservica instance on Firefox or Chrome.
@K8Sewell I have reported the Preservica Test outage to our digital preservation folks. They will work on a fix
Looks like we are in a good place again. The last parent object to update has the correct 54 children now (instead of 108).
Need to roll work into PROD but still keep ticket for others things. Can split out.
Spawning jobs again.
I tried to resync this object again in Production and it is still not working as expected. The notable issue here is the sort order is still wrong https://collections.library.yale.edu/management/parent_objects/10022080. Additionally the parent will not display in Blacklight. Is this possibly just an issue with this parent we need to fix?
Can we change the Bitstream filename over in Preservica? That's how the caption and ordering are created and thus what is throwing the sort order off. I am unable to find the matching record in Preservica test so if anyone has a link to that - would be greatly appreciated. I'd like to confirm the bitstream filename matches what is in Preservica for parent object 10022080 and I'd like to change it from _1 to _01 so that it captures the correct ordering and try updating the parent to confirm the sort order gets fixed. In the meantime I can draft up some logic that will adjust the filename to avoid this order issue but it feels a little bit of overkill now that I have an idea why the sort order was incorrect.
11 days of work as of 9.18.
As per David, "so it looks like the API returns as lexiographic sorting, which is alphabetic sorting of the numbers instead of numerically"
Will break adjusting the sorting we interpret from the API into another ticket #2621
Waiting for sync issue to be resolved
Just a note to say that Medical objects are now in Preservica.
I may try to resync one or two objects, to see if we have a similar issue as the one MUS object we are trying to resolve.
Attempted to sync parent 32320833 with Preservica files. Received 'Unable to login' error in DCS. Confirmed with Digital Preservation that the object has the correct security tag in Preservica (one that the DCS user s_dcs_medical should have access to), and that the correct structure and representation type were added on the spreadsheet. Will likely need to investigate on our end, as the Preservica stuff should be fine.
https://collections.library.yale.edu/management/batch_processes/14323
Still experience login issue with Medical.
Login issue with Medical fixed, will start work on these again.
For Medical, tried to update parent 32320833
Received the following error:
Parent OID: 32320833 because of Request error 404 <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Error><ExtendedMessage>No Information Object with ref but there another type of entity with the ref</ExtendedMessage><MessageKey>entity.does.not.exist</MessageKey></Error> for /structural-objects/426e5201-1d21-4b44-8b01-0a660383ee59
As you can see from the error, I put in "Structural" and not "Information", so I don't understand why it's telling me there is no Information object with that ref but a different type of entity with ref.
In Preservica, the object is a Structural Object with a Preservation representation type. As such, it seems like this is a DCS issue. Could we please investigate?
Created ticket #2691 for the above issue.
@motropuk Do you remember if your child OIDs were retained when you synced your Goobi objects, or were they replaced?
Today, I:
Used ‘Update Parent Objects’ batch process to add Preservica information to Parent 32329442, which had one child object (32329443).
Parent now has two child objects with the following oids: 33093723 33093724
The caption for both parents is32329443.tif
.
The old child, 32329443, appears to have been deleted; it is no longer in the Child Objects data table.
I resynced with Preservica, and it retained both two new children with the new OIDs. The folder in Preservica located at the assigned UUID only has one image.
We should not add Preservica information for any more Medical objects until we can confirm that (a) the original child OID can be retained and (b) the correct number of children are created for each object. These issues might be solved with #2510 ?
@sshetenhelm good question. I honestly I cant remember, or at least dont know if I checked. I think for the Music objects I was not too concerned if they child oids were updated, so didnt pay enough attention to that.
A sample of Preservica-reassociated Music objects have:
Affected parents include: https://collections.library.yale.edu/catalog/32204414 https://collections.library.yale.edu/catalog/32204693 https://collections.library.yale.edu/catalog/32202808
We should put this ticket back on hold until these issues are rectified. Will push back to backlog and pull in #2510 to fix errors (#2510 includes specifications to retain child OIDs and remove existing child objects without Preservica info).
The alternative being whether or not we are comfortable creating new child OIDs for all objects, and then manually deleting the prior "double" images.
@motropuk for the music objects, it is fine to delete the double children, leaving just the preservica ingested images, just to clean those parents up. But otherwise, sounds like working on #2510 first is the best way forward
Created #2703 for cleaning up parents
Follow up to #2426
All objects uploaded with the Goobi Temporary Ingests need to be synced to their files in Preservica. A spreadsheet with these is here https://docs.google.com/spreadsheets/d/1iqsayDnZz_ur8dH5iUHGap9kDfQkn2vMJ1dQms0hXBc/edit#gid=1783466688
Some materials are not yet ingested into Preservica, so these will need to be ingested into Preservica before we can reassociate in DCS. These are
The following are already in Preservica and could be reassociated now