nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.69k stars 620 forks source link

Changes in object input structure use cached values rather than starting new runs for exec #4916

Open mahesh-panchal opened 5 months ago

mahesh-panchal commented 5 months ago

Bug report

Expected behavior and actual behavior

Changes to objects passed as input should trigger new runs when using -resume. When changing an object's structure, e.g. putting it in an array, or map, the process uses the cached values from before, rather than executing new runs.

Steps to reproduce the problem

First run:

workflow {
    Channel.of(
        [ id:'foo', taxid: '632'],
        [ id:'bar', taxid: '632']
    )
    | TASK
    | view
}

process TASK {
    input:
    val meta

    exec:
    file("$task.workDir/node_id.txt").text = meta.taxid

    output:
    tuple val(meta), path('node_id.txt'), emit: node_id
}

Changed code (run 2):

workflow {
    Channel.of(
        [map:[ id:'foo', taxid: '632']],
        [map:[ id:'bar', taxid: '632']]
    )
    | TASK
    | view
}

process TASK {
    input:
    val meta

    exec:
    file("$task.workDir/node_id.txt").text = meta.taxid

    output:
    tuple val(meta), path('node_id.txt'), emit: node_id
}

Program output

Run 1:

$ nextflow run main.nf 
N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [zen_leibniz] DSL2 - revision: eaf2b5bb3d
executor >  local (2)
[31/f024e0] process > TASK (2) [100%] 2 of 2 ✔
[[id:foo, taxid:632], /workspace/Nextflow_sandbox/work/8e/853bdb1d954eef65d82eb81ba481e1/node_id.txt]
[[id:bar, taxid:632], /workspace/Nextflow_sandbox/work/31/f024e0f621ef2b27d3567887a7c5d3/node_id.txt]

Run 2: Expect error due to change in input object structure, but uses cached values instead.

$ nextflow run main.nf -resume
N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [cheesy_allen] DSL2 - revision: 8074d4d3e0
[6b/fb2322] process > TASK (1) [100%] 2 of 2, cached: 2 ✔
[[id:bar, taxid:632], /workspace/Nextflow_sandbox/work/56/2935d71088d17207dab38381590386/node_id.txt]
[[id:foo, taxid:632], /workspace/Nextflow_sandbox/work/6b/fb2322b25197873c2aaf6f0c75084e/node_id.txt]

Environment

bentsherman commented 5 months ago

This is happening because maps are hashed by hashing their values:

https://github.com/nextflow-io/nextflow/blob/2165a14d4b4d42ae37876533bf339b95add8e5bb/modules/nf-commons/src/main/nextflow/util/CacheHelper.java#L171-L176

So there is no difference between a value and value in a map, or even changing the keys of the map, though changing the order will change the hash.