Open landerlini opened 1 month ago
@landerlini @Bianco95 can you open an issue to the SLURM plugin repo?
@mbarbetti is ramping up on that repo and he's finding a few other issues that break compatibility with the slurm setup he is using. He can open the issue and I also expect a PR from him at some point.
Playing with slurm plugin we realized it returns the status completed even when the job fails. This is indeed the expected behavior for slurm semantics, but not for Kubernetes and the resulting pod would behave differently if submitted to different plugins.
An additional test is needed to make sure than when the job failes, the error status is propagated to k8s.