Closed gaow closed 5 years ago
You should still observe the "fail fast" principle, but "fail fast" should be specific to each pipeline, not to the entire benchmark.
GNU Make does "fail-fast" the way similar to what we are doing, but it has an option --keep-going
to try to finish as much as possible. So I guess both approaches has merits. I'm adding an interface to allow for both.
A --keep-going
option is added to override the default fail-fast and try to complete as much as possible. More tests are needed before making a release.
As a reference, Snakemake also uses this convention.
--keep-going, -k Go on with independent jobs if a job fails. Default: False
https://snakemake.readthedocs.io/en/stable/executable.html#all-options
I prefer this behavior because if I submit a long-running Snakemake job at the end of the day, I want it to know that even if some errors occur, it will still run as much as possible (this was an even bigger issue back on PPS, where 10% of my jobs would randomly fail for no reason).
It is good to have extra motivation for having this option.
Currently in DSC, any failure in any specific module instance will cause failure to the entire benchmark. This is in light of "fail-fast" notion. It has been argued by some user that this should not be a good behavior. We should allow for it to run as much as it could and report failure after.