Closed kgalens closed 1 month ago
I can reproduce the issue on 24.08.0-edge:
> /usr/local/bin/nextflow-24.08.0-edge run .
N E X T F L O W ~ version 24.08.0-edge
Launching `./main.nf` [focused_blackwell] DSL2 - revision: bc82ab126c
[af/ce2eeb] Submitted process > process1 (3)
[f7/ada6f2] Submitted process > process1 (1)
[12/e82491] Submitted process > process1 (2)
[f7/ada6f2] NOTE: Process `process1 (1)` terminated with an error exit status (2) -- Error is ignored
(hangs forever)
It looks like process1 completes, then the process2 task is scheduled, but never run:
Sep-09 20:52:35.981 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - Invoking task > process2 with params=id=4; index=1; values=[[SAMP2, SAMP3], true]
Sep-09 20:52:35.981 [Actor Thread 12] TRACE nextflow.processor.TaskProcessor - <process2> Process state changed to: StateObj[submitted: 1; completed: 0; poisoned: false ] -- finished: false
Sep-09 20:52:35.981 [Actor Thread 11] TRACE nextflow.processor.TaskProcessor - <process2> Control message arrived $ => groovyx.gpars.dataflow.operator.PoisonPill@e3b762d
Sep-09 20:52:35.982 [Actor Thread 11] TRACE nextflow.processor.TaskProcessor - <process2> Poison pill arrived; port: 1
Sep-09 20:52:35.982 [Actor Thread 2] TRACE nextflow.processor.TaskContext - Binding names for 'process2' > []
Sep-09 20:52:35.983 [Actor Thread 12] TRACE nextflow.processor.StateObj - <process2> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
Sep-09 20:52:35.983 [Actor Thread 12] TRACE nextflow.processor.TaskProcessor - <process2> Process state changed to: StateObj[submitted: 1; completed: 0; poisoned: true ] -- finished: false
Sep-09 20:52:35.986 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - [process2] Store dir not set -- return false
Sep-09 20:52:35.989 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - [process2] Cacheable folder=null -- exists=false; try=1; shouldTryCache=false; entry=null
Sep-09 20:52:35.991 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - [process2] actual run folder: /home/bent/projects/sketches/work/d3/ba17a52e118f36fd05c1434927dd8a
Sep-09 20:52:35.995 [Actor Thread 2] TRACE n.processor.TaskPollingMonitor - Scheduled task > TaskHandler[id: 4; name: process2; status: NEW; exit: -; error: -; workDir: /home/bent/projects/sketches/work/d3/ba17a52e118f36fd05c1434927dd8a]
Sep-09 20:52:35.996 [Actor Thread 2] TRACE nextflow.processor.TaskProcessor - <process2> After run
Sep-09 20:52:35.996 [Actor Thread 11] TRACE nextflow.processor.TaskProcessor - <process2> After stop
Sep-09 20:52:36.036 [Task monitor] TRACE n.processor.TaskPollingMonitor - Scheduler queue size: 0 (iteration: 9)
In fact, if I comment out process2 then the run finishes. Strange that it only happens with failOnIgnore.
Right now I suspect there is some race condition in the task polling monitor that is causing it to not submit the task when it should be able to.
Bingo:
Bug report
Expected behavior and actual behavior
When using the
ignore
errorStrategy with the workflow optionfailOnIgnore
, the pipeline hangs when there's a task failure.Steps to reproduce the problem
workflow.nf
Nextflow Config
I would expect that the workflow would complete with a non-zero exit status.
Program output
And it hangs and doesn't finish.
Environment
$SHELL --version
) [zsh 5.9 (arm-apple-darwin22.1.0)]Additional context