Closed holtgrewe closed 2 years ago
@natefoo ping ;)
@natefoo ping
ping
Thanks! I will try to review this ASAP. I'm wondering how the Slurm commands do this, however. Job information should be available from slurmctld for at least the value of MinJobAge
, which defaults to 300 seconds. I am not sure that it's correct (and is a significant change from current behavior) to have slurm-drmaa query SlurmDBD directly.
Hm, maybe this could be activated with an environment variable? For other schedulers such as grid engine, there is no distinction between the scheduler knowledge of jobs and accounting so this would homogenize the behaviour of DRMAA between schedulers.
I like the idea of an environment variable, or the config file could be used. I agree that this feature would be nice to do, especially since the DRMAA abstraction breaks down when you have to go to the DRM tools to do things (we essentially do what you're doing here in our application so it's not as if this isn't a necessary function!).
@natefoo thanks for the feeback and sorry for the delay. I have added an environment variable check and rebased to current master
. What do you think?
I have a fix for the test error in a followup, it only occurs on older Slurm versions.
At least in the case where Slurm accounting information is stored in a MySQL database, jobs that are completed are not available via the
slurm.h
functionality. Instead, the exit code has to be retrieved using theslurmdb.h
functionality.This patch extends the "
assume failed if no job was found
" inslurmdrmaa_job_on_missing()
to look intoslurmdb
instead.