Open robsyme opened 1 year ago
This is somehow related to this comment. @mribeirodantas tried removing || true
but reported no difference.
Some Bash hacker may want to give it another try
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I've hit this as well, when a cp
fails for any reason and it silently fails to stage out. Even worse if the output is optional, because nextflow can't differentiate between not-produced and failed-to-stage-out. Seeing the linked issues here would indicate it's somewhat important.
This is somehow related to this comment. @mribeirodantas tried removing
|| true
but reported no difference.Some Bash hacker may want to give it another try
Do you recall why? in theory if set -e
is set and a cp
operation fails, the exit code should be non-zero and it should fail.
$ cp doesntexist foo
cp: cannot stat 'doesntexist': No such file or directory
$ echo $?
1
Here there's a tentative solution but it was aborted
I think the @pditommaso solution https://github.com/nextflow-io/nextflow/pull/3858 (add set-e before nxf_unstage()) and the change to wait all nxf_parallel processes https://github.com/nextflow-io/nextflow/pull/4050 should detect the exit codes. The statement wait $p
returns the exit code and the set -e is failing the process when the wait returns something different to 0.
I have tested with Azure forcing an error in the stage out. It is detecting the failure and setting the exit code of the azcopy process. I need to check with the other clouds.
Another issue that I see is the print of the errors. Stage-out errors are in the .command.log but not in the .command.err, and it is not printed when there is an error in a process execution.
Fail tasks where stageOut fails
This is similar to existing issue https://github.com/nextflow-io/nextflow/issues/3372, but instead of being concerned with the
publishDir
step (run work directory to publish output location), this issue is concerned with theoutput
step (task work directory to run work directory)Usage scenario
When task outputs are copied from the task working directory to the run work directory using the aws cli, the
aws s3 cp
process can fail. Nextflow ignores this failure with a|| true
, which means that in the event of failure, the downstream tasks expecting this input will fail. This makes debugging harder than it should be, as users will be looking for the error in the downstream task rather than in the task that actually failed.Suggest implementation
I would recommend that we remove the
|| true
on L187 here:https://github.com/nextflow-io/nextflow/blob/2a224258cdd75321f8b3a239c93fdd688c7dc370/modules/nextflow/src/main/groovy/nextflow/executor/SimpleFileCopyStrategy.groovy#L184-L190
In addition, it may be helpful to provide a more descriptive warning in the event of an upload failure, but I'm not 100% sure how best to communicate that to the Nextflow process.