Closed adamrtalbot closed 3 months ago
Is it not finish
doing this?
No, finish
allows the currently running jobs to complete. This would tell Nextflow to continue submitting new jobs for as long as it can before dying.
The use case is for automated runs. If there is one bad sample out of hundreds, the run will fail early and need to wait until it can be manually restarted. Then the whole pipeline needs to run through, which is slow. With this option, the other samples would all process so when the pipeline is resumed without the bad sample it would complete very quickly.
yeah, you are right. I was trying to be lazy 😆
We could either change ignore
to return a non-zero exit code or add a new strategy like ignoreThenFail
, which one would you guys prefer?
I think ignnoreThenFail
is the more generally understandable solution.
Agreed.
If you want to be fancy, we could call it defer
but I'd like a non-native speaker to check that makes sense.
I generally prefer verbose and clear, for me ignoreThenFail
is the winner in this list 👆🏻 defer
is more likely to send me scurrying to the docs.
This is the solution that I was looking for. Is it still under consideration ?
I would like to second @boothms message. This would greatly improve the runs with multiple samples where at least one process for one sample might fail. It would be great if the progress of the execution could continue for the rest of the healthy samples.
@bentsherman , thank you so much for jumping on this. I have a quick question about the implementation.
If one sets the errorStrategy 'ignoreThenFail'
, that implies that the task will never be retried, correct?
Would there be a place where these 2 are not mutually exclusive? Try the maxRetries, and then if it fails, still have the possibility to defer the exit status to the end of the pipeline?
You could do that with a closure:
process.errorStrategy = { task.attempt < 3 ? 'retry' : 'ignoreThenFail' }
This is great. Thank you once again.
I'm really looking forward to testing this.
Hi folks, I saw that the PR is almost done. Do you have an estimate on when this might be merged?
Relevant community discussion: https://community.seqera.io/t/handling-single-sample-failures/618/3
So ignoreThenFail
cannot be used because errorStrategy cannot be longer than 10 chars.
Two choices:
nextflow.failOnIgnoredErrors = true
Where does the 10 character limit derive from?
But from conversations with users, I think the errorStrategy is the most intuitive thing, the thing people expect to find.
ignoreFail
? Though that sounds a bit like we're ignoring a failure.
My other suggestion was defer
, which isn't as clear but is shorter than 10 characters.
Like Jon, I have to ask where does the 10 character limit come from?
where does the 10 character limit come from
The platform db schema
deferFail
?
delayFail
?
I asked ChatGPT:
and the winner is failAtEnd
😄
I think I'm coming around to ignoreFail
and just rely the language server to provide an inline explanation (i.e. hover hint, code completion) if the user isn't clear on the meaning
I'm still not convinced about the best to proceed with this. Apart from the strategy name length, I'd like to avoid introducing a new error strategy for doing a similar thing to ignore, above all because this is going to impact some logic in Platform about how error are reported, etc.
I've made a variation in which the user is expected to use the string ignoreThenFail
as error strategy, however behind the scenes is mapped to ignore
, but still causing the failure on completion.
https://github.com/nextflow-io/nextflow/pull/5066
In this way, the changes is transparent from the point of view of Platform. However I fear that it can be difficult to determine why it failed because tasks will report ignore
as retry action
I've made two other implementation for this feature that prevent the propagation of a new error strategy literal.
PR #5066 implements a pseudo-error strategy named ignoreThenFail,
which is replaced behind the scenes with ignore
and triggers the expected behaviour. However it sounds a bit hacky and not transparent the behavioural.
PR #5071 implements the same logic as a nextflow config setting.
I think I prefer #5071. When the workflow fails due to failOnIgnore
, we can say as much in the error message so that it's clear why the workflow failed
This is a pretty useful feature. Imagine the scenario where you have 96 independent samples, if 1 fails and it brings the whole pipeline crashing down that's annoying, but if you use errorStrategy = 'ignore' then you have to write your own solution. We could have
process.errorStrategy = 'catch'
. This might be a separate ticket though.Having some form of introspection on tasks in the workflow.onComplete block would also help for similar reasons and be more generally useful.
Originally posted by @adamrtalbot in https://github.com/nextflow-io/nextflow/issues/4365#issuecomment-1759191011