nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images
https://oncopathtk.org
Other
21 stars 2 forks source link

Better logs, and more resilient job queue logic #370

Closed jimmymathews closed 1 month ago

jimmymathews commented 1 month ago

Addresses #368.

Also fixes the job queue logic so that newly created worker containers will first review the queue, without prompting from the message channel it is LISTENing on in postgres.

jimmymathews commented 1 month ago

With scaling (dynamic creation and removal of worker processes), the probability of occasional failed jobs increases (due to differences in memory availability, for example) and we finally need to track these more carefully. To finish this issue, I am implementing a flag on quantitative_feature_value_queue that workers set when they begin computation (rather than pulling off the queue). This may also include a timestamp, so a "watchdog" step can note probably-failed jobs, log a warning, then clean up the corresponding features.

jimmymathews commented 1 month ago

The above was completed.