yalelibrary / YUL-DC

Preliminary issue tracking for Yale University Libraries Digital Collections project
3 stars 0 forks source link

[BUG] Batch Process Running Multiple Times #2859

Closed K8Sewell closed 6 days ago

K8Sewell commented 2 weeks ago

Summary

Some batch processes are running multiple times and producing more errors than there are rows in the csv.

Acceptance Criteria

Engineering Notes

Triplicate deletions in this UAT batch process - https://collections-uat.library.yale.edu/management/batch_processes/2039

K8Sewell commented 1 week ago

Maggie, Martin, JP, and I reviewed Batch Process 2039 and we only found duplicate deletions of rows 2 to 108 while the csv has 213 oids. The current behavior is to retry a job 3 times if it does not complete each row successfully. From row 188 to the end of the csv those parent objects were only processed once. There is no apparent bug causing the jobs to run multiple times but a planned behavior to do just that.

Suggestion is to create a ticket to batch the DeleteParentObjectsJob to process 50 parent objects at a time to prevent the job from timing out and failing before all records in a csv are processed. Since this job can be asked to process thousands of records at a time this seems like a logical step forward. Which would mean closing this ticket and creating the batching ticket. @sshetenhelm what do you think of this idea?

sshetenhelm commented 1 week ago

I am comfortable with that. Are there any other jobs that we can think of that would benefit from similar "batching" ?

sshetenhelm commented 6 days ago

Created a placeholder ticket for this functionality. Ready to close after we update points to reflect any effort spent in discovery.