scientist-softserv / adventist_knapsack

Apache License 2.0
1 stars 0 forks source link

ADL: run ingest scripts and monitor progress #224

Open ShanaLMoore opened 7 months ago

ShanaLMoore commented 7 months ago

Rob to provide a script to run in rancher.

Dev should run it, one at a time, and not start the next one until the jobs have finished.

laritakr commented 7 months ago

from Rob:

to start issue import jobs from the Rails console:

rails c
switch! 'adl'
batch = 5
GoodJob::Job.where(scheduled_at: DateTime.parse("2040-01-01 00:00:00")).limit(batch).update_all(scheduled_at: nil)

batch size can be increased as we get confidence that things are working.

I put them all as scheduled for 2040-01-01 so we should have to worry about them starting on their own

laritakr commented 7 months ago

Test batches that failed with entries needing to be rerun:

Image

Image

ShanaLMoore commented 7 months ago

TODO - re split or re run importer for books?

laritakr commented 7 months ago

Problems I am seeing:

FileSetAttachedEventJob NoMethod failures

Jobs usually, but not always, appear to have imported correctly.

Can obtain a fileset id to identify which work threw the error. Re-running one ended normally.

Modeshape errors (show in notifications)

Unable to link to a failing work from the error. Job log shows error but not included in message sent thru mailboxer.

In this case, no file is attached to the fileset.

Update 12/2/2023: Rob added additional logging that appears in the mailboxer notifications.

Screenshot 2023-11-20 at 8 16 13 PM

PDF file not attaching to the pdf fileset is the most common

No errors show in jobs... Errors show up in the dashboard notifications. Can query notifications for a list: Mailboxer::Notification.where("subject like ?", "%Error%")

The errors come from ActiveFedora finder_methods.rb method load_from_fedora The majority of these still have a file attached to the fileset but no pdf splitting.

Update 12/2/2023: Rob added additional logging that appears in the mailboxer notifications.

Screenshot 2023-11-20 at 6 04 25 PM

RuntimeError: IiifPrint::SplitPdfs::DerivativeRodeoSplitter#split_files encountered 'DerivativeRodeo::Errors::FileMissingError'

_This seems to be a failure in iiifprint, where the local PDF file doesn't hang around long enough.

Update:: Put in a patch to iiif_print to make sure we have the PDF file available at the appropriate location when we try to split

Create derivative jobs are trying to run HOCR on the thumbnail file.

The jobs retry 5 times and then end with no failure. This doesn't cause any error but definitely is a waste of processing time.

Update: Patch put in place 11/3/2023 to prevent these.

Screenshot 2023-11-20 at 2 58 38 PM

One case where we apparently didn't have a parent record in the splitter

_Appeared as no method error calling iiif_printconfig. Can obtain the file set ID from the job to identify which work failed.

InheritPermissionsJob and VisibilityCopyJob

Occasionally submitted with null arguments.

These reschedule themselves 5 times and then end normally. Not sure what errors trigger the null arguments.

laritakr commented 5 months ago

The config to clean out completed jobs does not seem to be working, so in the meantime, run this periodically in the tenant's console:

jobs = GoodJob::Job.where(error: nil).where("finished_at < ?", 7.days.ago)
jobs.find_each(&:destroy)
laritakr commented 5 months ago

Notifications are overwhelming and timing out the page, so successful ones need to be cleaned up periodically. Note that this doesn't address Mailboxer::Receipt entries, but it does enable the notifications page to be opened again.

Inside the console for a tenant:

notifs =  Mailboxer::Notification.where(subject: "Passing batch create")
convos = Mailboxer::Conversation.where(subject: "Passing batch create")
notifs.find_each(&:destroy)
convos.find_each(&:destroy)