ngless-toolkit / ngless

NGLess: NGS with less work
https://ngless.embl.de
Other
142 stars 24 forks source link

Allow collect()'ing when all processing is complete #112

Open unode opened 5 years ago

unode commented 5 years ago

Scenario:

1) 12 samples are being processed using the parallel machinery lock1() and collect(). 2) 10 samples complete and 2 fail. 3) The 2 failing samples are considered bad and are excluded from the sample file.

At this point re-running ngless has no effect since all work is complete however the merged output from collect() was never generated.

collect() can also fail to occur in rare cases where the last two samples finish almost simultaneously or filesystem lag prevents the last two processes from seeing all samples as complete.

unode commented 4 years ago

In order to keep compatibility with the current behavior (no action when finished), I'm wondering if this should be implemented through a --only-collect command-line option.

Effectively we have to skip all actions (preprocess, map, fastq, paired, ...) except collect but, we still need to have a sample name for collect to act upon.

luispedro commented 4 years ago

Over the long term, I would prefer an approach where, whenever ngless runs¸ it will create any missing outputs. The whole lock1/collect business is a bit of a hack now. This is probably for NGLess 2, though.