nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.74k stars 626 forks source link

DSL2 - emit tuples with optional values #2678

Open rcannood opened 2 years ago

rcannood commented 2 years ago

Usage scenario

I'd like to be able to return a tuple with optional elements. For example, by defining the output as tuple val(id), path("output.txt"), path("output2.txt" optional: true), I'd like a process to be able to emit an event ["foo", path("output.txt"), null].

The process and downstream processes can take a while to run, so using a multi-channel output in combination with a groupTuple() (See Attempt 3) is very undesirable.

Suggested implementation

Probably this would require:

Reproducible examples

I made several attempts at getting this to run with the current implementation of Nextflow. To summarise:

Attempt 1: optional path in tuple

Because of TupleOutParam.groovy#L103-L105, this optional value is overridden by the tuple's value for 'optional', namely false.

If I try to run the code following code, Nextflow will produce an error when output2.txt is missing.

Attempt 1 reprex ```groovy nextflow.enable.dsl=2 process test_process1 { input: tuple val(id) output: tuple val(id), path("output.txt"), path("output2.txt", optional: true) script: """ echo $id > output.txt if [[ "$id" == "foo" ]]; then echo $id > output2.txt fi """ } workflow { Channel.fromList( ["foo", "bar"] ) | view { "input: ${it}" } | test_process1 | view { "output: ${it}" } } ``` ↓ ``` $ NXF_VER=21.10.6 nextflow run test_outputs_opt1.nf input: foo input: bar output: [foo, work/81/e866d5e329c9ac9980a0c9313d347b/output.txt, work/81/e866d5e329c9ac9980a0c9313d347b/output2.txt] [8c/e39e04] NOTE: Missing output file(s) `output2.txt` expected by process `test_process1 (2)` -- Error is ignored ```

Attempt 2: make the whole tuple optional

By making the whole tuple optional, Nextflow doesn't produce an error anymore, but my whole tuple is removed, which is undesirable.

Attempt 2 reprex ```groovy nextflow.enable.dsl=2 process test_process1 { input: tuple val(id) output: tuple val(id), path("output.txt"), path("output2.txt") optional true script: """ echo $id > output.txt if [[ "$id" == "foo" ]]; then echo $id > output2.txt fi """ } workflow { Channel.fromList( ["foo", "bar"] ) | view { "input: ${it}" } | test_process1 | view { "output: ${it}" } } ``` ↓ ``` $ NXF_VER=21.10.6 nextflow run test_outputs_opt2.nf input: foo input: bar output: [foo, work/95/0e07ee0b94834d4587509b152aa354/output.txt, /home/rcannoodwork/95/0e07ee0b94834d4587509b152aa354/output2.txt] ```

Attempt 3: multichannel output

This approach is what is proposed in #1980. However, having to use 'groupTuple()' to merge the multichannel output back into a single event is also undesirable, as now the whole Channel needs to be executed before any events can be emitted downstream. Note that setting size: 2 doesn't work in this case, since some tuples should have one element, others two.

Attempt 3 reprex ```groovy nextflow.enable.dsl=2 process test_process2 { input: tuple val(id) output: tuple val(id), val("output1"), path("output.txt") tuple val(id), val("output2"), path("output2.txt") optional true script: """ echo $id > output.txt if [[ "$id" == "foo" ]]; then echo $id > output2.txt fi """ } workflow { Channel.fromList( ["foo", "bar"] ) | view { "input: ${it}" } | test_process2 | mix | groupTuple(by: 0) | map{ [ it[0], [it[1], it[2]].transpose().collectEntries() ]} | view { "output: ${it}" } } ``` ↓ ``` $ NXF_VER=21.10.6 nextflow run test_outputs_opt3.nf input: foo input: bar output: [bar, [output1:work/9c/97b3a2884f97594532a19923e6c748/output.txt]] output: [foo, [output1:work/60/984231826c9a9cc2a1e1cf29e16fdb/output.txt, output2:work/60/984231826c9a9cc2a1e1cf29e16fdb/output2.txt]] ```

Attempt 4: add junk to output

By adding a file known to exist (e.g. ".command.sh") to the output, I can force the Channel to always return a tuple. This works, but the code looks quite messy and I need to do postprocessing to remove the additional file.

Attempt 4 reprex ```groovy nextflow.enable.dsl=2 process test_process3 { input: tuple val(id) output: tuple val(id), path{[".command.sh", "output.txt"]}, path{[".command.sh", "output2.txt"]} script: """ echo $id > output.txt if [[ "$id" == "foo" ]]; then echo $id > output2.txt fi """ } workflow { Channel.fromList( ["foo", "bar"] ) | view { "input: ${it}" } | test_process3 | map { output -> map = [["output1", "output2"], output.drop(1)].transpose() map_without_dummy = map.collectEntries{ key, out -> if (out instanceof List && out.size() > 2) { [ key, out.drop(1) ] } else if (out instanceof List && out.size == 2) { [ key, out[1] ] } else { [ key, null ] } } [ output[0], map_without_dummy ] } | view { "output: ${it}" } } ``` ↓ ``` $ NXF_VER=21.10.6 nextflow run test_outputs_opt4.nf input: foo input: bar output: [foo, [output1:work/96/a51f95280ee3332f50b6b05a12596b/output.txt, output2:work/96/a51f95280ee3332f50b6b05a12596b/output2.txt]] output: [bar, [output1:work/ec/87149bfea74975d37307d6a115c812/output.txt, output2:null]] ```
stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rcannood commented 2 months ago

Out of curiosity, is this issue still being worked on? Is it already possible to have an nullable optional output?

Thanks! :bow:

bentsherman commented 1 month ago

Hi @rcannood , we've gone through a few sketches since you first submitted the issue. I think we ran into some tricky limitations under the hood that require more fundamental improvements to support nullable values properly.

You can see the current state of development here: #4553 . Even this PR is likely not how it will look in the end, but it can give you an idea of where we are heading. Basically, instead of trying to patch the nullable option into the current syntax, we are working on a broader "static type" syntax that should also cover nullable values.

Thanks for your patience in the meantime. It ended up being a deeper rabbit hole than we thought 😅

rcannood commented 1 month ago

Thanks for the update @bentsherman ! Looking forward to static types :)