nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.78k stars 632 forks source link

Referencing task in shell and stub sections causes StackOverflow #5483

Open bskubi opened 2 weeks ago

bskubi commented 2 weeks ago

Bug report

Expected behavior and actual behavior

I should be able to use task in both the shell and stub sections, but if I do it generates a stack overflow error

Steps to reproduce the problem

process p {
    input: val(v)

    shell:
    print(task)
    ":"

    stub:
    print(task)
    ":"
}

workflow {
    channel.of(1, 2, 3) | p

}

Program output

(hichdev) benjamin@laptop:~/Documents/temp$ nextflow run test.nf

 N E X T F L O W   ~  version 24.10.0

Launching `test.nf` [extravagant_stonebraker] DSL2 - revision: 477779c3c0

[-        ] p -
[-        ] p -
ERROR ~ Error executing process > 'p (2)'

Caused by:
  java.lang.StackOverflowError -- Check script 'test.nf' at line: 9

Source block:
  print(task)
  ":"

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

Environment

Linux, Nextflow 24.10, OpenJDK 21

bentsherman commented 1 week ago

@jorgee can you take a look? seems like another thing related to the stub resolution

jorgee commented 1 week ago

It looks like a corner case that causes an infinite loop. It is because the TaskConfigcontains the stub. So, when task is printed, it invokes the LazyMap.toString() that prints all the elements in the map. To print the stub it executes the closure, that tries to print again the TaskConfig, causing the loop. I would suggest to not expose the stub in the task context. However, I think it could also happen if it in a directive closure you print the task. Another option to fix could be to modify the LazyMap.toString() to print the Closure instead of resolving them.

pditommaso commented 1 week ago

I wonder what's the use case for this. task is meant to allow the access of the field attributes not "printing" itself

bentsherman commented 1 week ago

Likely the task variable should be restricted. The task properties are documented here, but in reality the runtime adds many other properties as a convenience. We should decide which properties we want to be defined and not add anything else under the hood. Likely the stub should not be added to task since task can be reference in the stub.

bskubi commented 1 week ago

In my workflow, I dump various process attributes to a json metadata file in the stage directory, then gather them all into a single TinyDB json database to make it convenient to load desired files for analysis and create my own record of workflow behavior (I realize Nextflow also produces records of its behavior, but its options didn't entirely meet my needs). I wanted to dump the task attributes as well, which is why I was trying to use the task variable in the process body.