The original job_exe_end and job_exe query was terrible for many reasons. I changed it to just get the node to job relation which should be much faster. Fixed tests and added node assignment to the the test utils.
My thoughts on anytime the scheduler is killed while jobs are running (migration, restarted, etc.):
When the schedule comes back and hits this logic it is going to be given a list of nodes from the DB that used to be running jobs. This won't be an accurate list since those jobs are lost.
Whenever the scheduler cleans up the lost jobs (marked as CANCELED or FAILED) this function will stop returning those nodes in said list and they will be purged from scheduler memory and updated in the DB (is_active set to False) assuming they are gone and not offering resources.
Checklist
manage.py test
passesAffected app(s)
Description of change
The original
job_exe_end
andjob_exe
query was terrible for many reasons. I changed it to just get the node to job relation which should be much faster. Fixed tests and addednode
assignment to the the test utils.