Closed mwojcikowski closed 9 years ago
Hi @mwojcikowski,
Thanks, we are using --signal so that the job doesn't show up as cancelled. Do you happen to know if there is a way to delete the whole array so that the job is marked as finished instead of cancelled?
Mapages state that using --signal bypasses slurmctld and goes straight to slurmd, which most probably have no clue of array job.
The name or number of the signal to send. If this option is not used the spec-
ified job or step will be terminated. Note. If this option is used the signal
is sent directly to the slurmd where the job is running bypassing the slurmctld
thus the job state will not change even if the signal is delivered to it. Use
the scontrol command if you want the job state change be known to slurmctld.
Most probably the job ends up as "not-canceled" as a side-effect of not notifying the ctld of the change.
I guess the correct behaviour would be achieved if you enumerate all jobs during the scancel call:
scancel --signal=KILL 1234 1235 1236
PS. I didn't check if --signal works with underscore notation of array jobs.
PS2. slurm 14.11.8
Thanks Maciej,
I ended up just dropping passing KILL as the signal like you suggested. I tried a couple different ways of sending the KILL signal to the job arrays but I always ended up with a couple of the engines not killed.
When scancel used with --signal option, then only the first job of an array is deleted. If the --signal option is deleted, then everything is functioning correctly.