nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.76k stars 629 forks source link

Pipeline uses cached process despite edited eval script #5470

Open CormacKinsella opened 1 week ago

CormacKinsella commented 1 week ago

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

  1. Generate nextflow.config and main.nf from below example
  2. Run the pipeline
  3. Edit the eval statement to 'multiqc --version | sed "s/version//"'
  4. Rerun -> nothing changes about the output
resume = true
apptainer.enabled = true
apptainer.autoMounts = true
nextflow.enable.dsl=2
nextflow.preview.topic = true

process FOO {
    tag "${id}"
    container "quay.io/biocontainers/multiqc:1.14--pyhdfd78af_0"

    input:
    val(id)

    output:
    tuple val(task.process), eval('multiqc --version'), topic: versions

    script:
    """
    """
}

workflow {
    input_ch = Channel.from("Sample1") 

    FOO(input_ch)

    channel.topic('versions').view()
}

Program output

Before edit: [FOO, multiqc, version 1.14]

After edit: [FOO, multiqc, version 1.14]

Should be: [FOO, multiqc, 1.14] # Note that this behaviour is achieved by moving resume = true from nextflow.config to main.nf

Environment

Additional context

bentsherman commented 1 week ago

Good point, the eval script was never added to the task hash

bentsherman commented 10 hours ago

@jorgee this one should be pretty easy if you'd like to try. Basically we need to add the eval outputs to the task hash here:

https://github.com/nextflow-io/nextflow/blob/8041a5799bd2187e4758d453a924ffc75be4e656/modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy#L2200-L2205

Something like this:

        // add inputs ...
        // ...

        // add eval outputs
        for( Map.Entry<OutParam,Object> it : task.outputs ) {
            if( it.key instanceof CmdEvalParam )
                keys.add( it.key.getTarget(task.context) )
        }