Closed johanneskoester closed 2 months ago
The changes involve modifications to the run_job
and cancel_jobs
methods in the SLURM executor plugin. The run_job
method now encloses the slurm_logfile
variable in single quotes to handle file paths correctly. Additionally, the cancel_jobs
method includes new exception handling for subprocess.CalledProcessError
, improving error reporting during job cancellation.
Files | Change Summary |
---|---|
snakemake_executor_plugin_slurm/__init__.py |
- run_job : Enclosed slurm_logfile in single quotes. - cancel_jobs : Added exception handling for subprocess.CalledProcessError . |
sequenceDiagram
participant User
participant SLURMExecutor
participant Subprocess
User->>SLURMExecutor: run_job()
SLURMExecutor->>Subprocess: Execute SLURM command with 'slurm_logfile'
Subprocess-->>SLURMExecutor: Return success or error
SLURMExecutor-->>User: Return job status
User->>SLURMExecutor: cancel_jobs()
SLURMExecutor->>Subprocess: Execute scancel command
alt Error Occurred
Subprocess-->>SLURMExecutor: Raise CalledProcessError
SLURMExecutor-->>User: Return error message with exit code
else Success
Subprocess-->>SLURMExecutor: Return success
SLURMExecutor-->>User: Confirm cancellation
end
π In the land of code, where jobs do run,
A tweak to the logs brings joy and fun.
Errors now caught, with messages clear,
SLURM's dance grows smoother, letβs all cheer!
Hops of success, let the workflows flow,
With every change, our spirits grow! π
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
will take a look in the late afternoon - right now, I have meeting after meeting.
@johanneskoester I tried to read the source code. Didn't help. Consider this:
$ scancel
scancel: error: No job identification provided
$ echo $?
1
$ sacct -j 16161523 -o state -X
State
----------
COMPLETED
$ scancel 16161523
$ echo $?
0
Which, according to your list, ought to be 2 for the last line. Also, your list states, that exit codes 8 and 0 are identical!
Some of the listed codes do not make sense at all: scancel
is there to cancel (obviously) or signal jobs (steps). A job return code cannot indicate that it was requeued and scancel only gives its own exit codes (see main function after line 106) and distinguishes internally between its exit code and job codes. So, if anything, these codes refer to job exit codes, which (except for general ones) are software specific).
BTW I like your PR, might have a look into #136 (the feature works for me, but the tests fail, because an apparent NONE type?).
PS black is ok with the length of line 489, the CI is not. That is an issue in itself, don't you think?
after the post about the signals: If we erase considering the error code (because of the questionable purpose), we can delete it and the line shortens.
PS black is ok with the length of line 489, the CI is not. That is an issue in itself, don't you think?
after the post about the signals: If we erase considering the error code (because of the questionable purpose), we can delete it and the line shortens.
Thanks for checking this (I really only stupidly pasted the AI output on the error codes, this again shows how useless this can be ATM). Still, since I observed the error code 8, let us keep it in the exception message for now, maybe it is useful for some people.
Gnarf, now the tests fail, because of missing test files. Wonderful. Not something, I will check on a Sunday morning, though. Tomorrow, I will have lectures till noon, only them I might have time to investigate.
Should be fixed now.
Summary by CodeRabbit
New Features
Bug Fixes