Closed LudvigOlsen closed 1 year ago
What does jobinfo
say about target A? Could it be that A didn't produce that output files it should, but still completed with a zero return code? Then B would start running, but fail since an input file is missing.
Does it still say "submitted" after some time? Slurm is a bit unreliable when it comes to fetching the status, so it can take 10-30 seconds before it reports the correct status for a job/target.
If you can produce a minimal example the reproduces the error, that would be great!
It starts running after a short while. A is quite a long job (currently fails after 20+ minutes due to some bugs I'm working through).
Jobinfo for A:
Start times are identical in jobinfo:
B: 2023-05-17T13:58:42 A: 2023-05-17T13:58:42
End time for B is 2023-05-17T13:58:52
, A is still running.
Will see if I can make a reproducible example in the coming days :-) (It's 8PM here in Singapore)
Can you provide the complete output of gwf info
for target B? Also, can you provide the path (over e-mail) to the workflow file? Thanks :-)
I've sent it all in a mail :-)
I have a workflow with 2 targets (say A and B), where the input to B is created by A. I submit both jobs at the same time and shortly after, A is
running
and B isfailed
.Here, I first print the path to the input file and whether it exists, then the status (
gwfss
is just an alias for summary). A issubmitted
andB
failed. In this case, I actually only asked to submitB
but it also submittedA
, so it knows B depends on A.Now, usually I suspect this type of error to be my own, but in this case, the workflow is so simple, that it seems to be a bug. I have had some instances previously, where I suspected this to be a bug but where the workflow was way too complex to be certain.
Here is the code for submitting B. Note that A is supposed to make
sample_dir / "dataset" / "feature_dataset.npy"
so it doesn't currently exist.to_strings
is just a list comprehension convertingPath
s to strings.The job fails when it cannot find the
feature_dataset.npy
.When looking in
gwf info
for B, it correctly has the...feature_dataset.npy
path in inputs, and so it shouldn't run as that file does not exist.Let me know, if you need other information.