oss-aspen / 8Knot

Dash app in development to serve open source community visualizations using GitHub data from Augur. Hosted app: https://eightknot.osci.io
MIT License
47 stars 59 forks source link

make workers more resillient #626

Closed JamesKunstle closed 5 months ago

JamesKunstle commented 5 months ago

prefetch and concurrency were pinning workers to jobs that weren't going to terminate. this should alleviate some worker pool exhaustion by cancelling deadlocked tasks and allowing workers to immediately pick up tasks from queue if they're available.

JamesKunstle commented 5 months ago

@codekow This PR makes Celery Worker daemons effectively single-task so Openshift can manage the scale-up, and we avoid tasks being reserved by workers that will block forever.

Could you please spot-check this?

JamesKunstle commented 5 months ago

tl;dr: make workers single-task, disable prefetch, bump # workers, decrease ttl for deadlocked tasks.

Won't need query workers when app and db are on the same hardware and we can bypass caching, and the data MUST be available because we have a copy of the db, so we shouldn't have to deal w/ deadlocks in the future.

deadlocks arise because the analysis workers are polling, waiting for data to be queried from the main db and cached. if something goes wrong, they'll wait until their terminated, blocking a worker from attending to other user's requests.

This polling behavior should be remedied by this fix: https://github.com/plotly/dash/issues/2725