natefoo / slurm-drmaa

DRMAA for Slurm: Implementation of the DRMAA C bindings for Slurm
GNU General Public License v3.0
48 stars 22 forks source link

Doesn't handle cross-platform exit statuses gracefully #30

Open EricR86 opened 5 years ago

EricR86 commented 5 years ago

Looking into this it seems that the exit status returned from a child process is not handled gracefully in some instances.

Ideally the exit status should be using the macros defined in sys/wait.h but it is only used sparingly across drmaa.c. Notably the macro WIFEXITED is used but not WIFSIGNALED or WTERMSIG. For example, instead of WIFSIGNALED there is an operation that may or may not be the same as the macro necessary for that particular architecture. Is there a reason why these macros were not used?

This is related to Issue #26. After removing the hardcoded exit status manipulation with macros, I suddenly went from jobs reporting "unknown signal?!" to "wasTerminated" which was far more informative in terms of tracking down our issues (and ultimately solved our problem).