usegalaxy-au / infrastructure

Galaxy Australia's Ansible scripts
MIT License
7 stars 18 forks source link

New jobs are not being dispatched if there are 100s of other 'new' jobs subject to concurrency limits we have set within TPV #2254

Open cat-bro opened 1 month ago

cat-bro commented 1 month ago

A user has 250 new alphafold jobs (limited in TPV to 2 per person) and jobs she has been submitting after this have not been dispatching.

I have put a hack fix in for now by increasing the ready_window_size in the job conf to 120 which works on this occasion because no handler has more than 60 of these alphafold jobs, but this will not work for an arbitrary number of new jobs.

TPV limiting raises a JobNotReadyException if there are too many jobs of a type for a user. The presence of these new jobs must be preventing other jobs submitted by the same user from ever getting Grabbed.

cat-bro commented 1 month ago

This is not because of TPV, but because of the way limits are set with Galaxy Australia's TPV configuration.