An exception encountered after a non zero exit code in a job of a job array :
java.lang.NullPointerException at java.util.Objects.requireNonNull(Objects.java:203) at
com.powsybl.computation.ExecutionError.<init>(ExecutionError.java:24) at com.powsybl.computation.slurm.JobArraySlurmTask.convertScontrolResult2Error(JobArraySlurmTask.java:95) at
com.powsybl.computation.slurm.AbstractTask.generateReport(AbstractTask.java:140) at
com.powsybl.computation.slurm.AbstractTask.await(AbstractTask.java:130) at
com.powsybl.computation.slurm.SlurmComputationManager.doExecute(SlurmComputationManager.java:277) at
com.powsybl.computation.slurm.SlurmComputationManager.lambda$execute$0(SlurmComputationManager.java:220) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
com.powsybl.computation.CompletableFutureTask.run(CompletableFutureTask.java:43) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
Another one, encountered in a case where multiple commands have non-zero exit codes :
Caused by: java.lang.NumberFormatException: For input string: "12-39"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at com.powsybl.computation.slurm.ScontrolCmd$ScontrolResultBean.parseShowJob(ScontrolCmd.java:146)
at com.powsybl.computation.slurm.ScontrolCmd$ScontrolResultBean.parse(ScontrolCmd.java:123)
at com.powsybl.computation.slurm.ScontrolCmd$ScontrolResultBean.<init>(ScontrolCmd.java:81)
at com.powsybl.computation.slurm.ScontrolCmd$ScontrolResult.parse(ScontrolCmd.java:52)
at com.powsybl.computation.slurm.ScontrolCmd$ScontrolResult.<init>(ScontrolCmd.java:44)
at com.powsybl.computation.slurm.ScontrolCmd.send(ScontrolCmd.java:35)
at com.powsybl.computation.slurm.AbstractTask.generateReport(AbstractTask.java:137)
at com.powsybl.computation.slurm.AbstractTask.await(AbstractTask.java:130)
at com.powsybl.computation.slurm.SlurmComputationManager.doExecute(SlurmComputationManager.java:277)
at com.powsybl.computation.slurm.SlurmComputationManager.lambda$execute$0(SlurmComputationManager.java:220)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.powsybl.computation.CompletableFutureTask.run(CompletableFutureTask.java:43)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
TODO
What is the expected behavior?
In general, to allow proper error handling by the client, a non-zero exit code must not throw but report an ExecutionError.
The problem seems to be that we don't have any command associated with a job ID retrieved from scontrol result, here:
Bug
An exception encountered after a non zero exit code in a job of a job array :
Another one, encountered in a case where multiple commands have non-zero exit codes :
TODO
In general, to allow proper error handling by the client, a non-zero exit code must not throw but report an
ExecutionError
. The problem seems to be that we don't have any command associated with a job ID retrieved from scontrol result, here:Maybe in that case it's better to simply not report an
ExecutionError
?Proper error handling by the computation manager user.