nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.71k stars 622 forks source link

Can't modify metadata using values from `each` #5046

Closed olgabot closed 3 months ago

olgabot commented 3 months ago

Bug report

Expected behavior and actual behavior

Hello, Hope you are well! I would like to use the each directive for performing processes with multiple numbers, and then add this number to the meta field so I can reference it in future processes. However, the number value in meta is not changing between processes.

Steps to reproduce the problem

In this repo is a self-contained example, with the code below:

process test_each {

    input:
    tuple val(meta), path(filename)
    each number

    output:
    tuple val(meta), path("*_*.md"), emit: md_number

    script:
    meta.number = number

    """
    touch ${filename.simpleName}_${number}.md
    """
}

workflow {
    // input_file = Channel.fromPath("README.md")
    input_ch = Channel.of([[id:"README"], file("README.md")])
    numbers = [1,2,3,4,5]

    input_ch.view()

    test_each(input_ch, numbers)

    test_each.out.md_number.view()
}

Program output

Here is the nextflow.log file.

Below is the program output. Notice that number:5 is set for all of them, even though the number in README_#.md changes every time.

(nf-core) 
 ✘  Mon  3 Jun - 19:20  ~/code/nextflow-each-metadata-error   origin ☊ main ✔ 8☀ 
 @olgabot  nextflow run test.nf

 N E X T F L O W   ~  version 24.04.2

Launching `test.nf` [festering_ramanujan] DSL2 - revision: 58475416cc

executor >  local (5)
[74/cda3dc] test_each (5) | 5 of 5 ✔
[[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/bf/dc139c4fc8fc2cad887b5a3c6dec52/README_3.md]
executor >  local (5)
[74/cda3dc] test_each (5) | 5 of 5 ✔
[[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/bf/dc139c4fc8fc2cad887b5a3c6dec52/README_3.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/ef/896a56d0809fd458009a7f5aba0f14/README_1.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/74/cda3dc203f34658105bd458d6d43fc/README_5.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/9b/0bdb9be14090c8f1399423276e2009/README_4.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/29/d5ae1db2cf3cbb3f978235d15ba60b/README_2.md]

I would have expected this output, where the number:# field in the metadata matches the README_#.md filename.

[[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/bf/dc139c4fc8fc2cad887b5a3c6dec52/README_3.md]
[[id:README, number:1], /Users/olgabot/code/nextflow-each-metadata-error/work/ef/896a56d0809fd458009a7f5aba0f14/README_1.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/74/cda3dc203f34658105bd458d6d43fc/README_5.md]
[[id:README, number:4], /Users/olgabot/code/nextflow-each-metadata-error/work/9b/0bdb9be14090c8f1399423276e2009/README_4.md]
[[id:README, number:2], /Users/olgabot/code/nextflow-each-metadata-error/work/29/d5ae1db2cf3cbb3f978235d15ba60b/README_2.md]

Am I missing something? Is there another way to do what I'm trying to do here?

Thank you so much! Warmest, Olga

Environment

Additional context

(Add any other context about the problem here)

olgabot commented 3 months ago

Hello, I am getting the same issue when I use combine to create a cartesian product of the channels before using them. For some reason, meta.number is the same across all processes.

Here is the script:


process test_combine {

    input:
    tuple val(meta), path(filename), val(number)

    output:
    tuple val(meta), path("*_*.md"), emit: md_number

    script:
    meta.number = number

    """
    touch ${filename.simpleName}_${number}.md
    """
}

workflow {
    // input_file = Channel.fromPath("README.md")
    input_ch = Channel.of([[id:"README"], file("README.md")])
    numbers = [1,2,3,4,5]

    input_ch_numbers = input_ch.combine(numbers)
    input_ch_numbers.view()

    test_combine(input_ch_numbers)

    test_combine.out.md_number.view()
}

And here is the output:

(nf-core)
 Wed  5 Jun - 07:19  ~/code/nextflow-each-metadata-error   origin ☊ main 17☀ 6● 
 @olgabot  nextflow run test_combine.nf

 N E X T F L O W   ~  version 24.04.2

Launching `test_combine.nf` [marvelous_cray] DSL2 - revision: 11811929d2

executor >  local (5)
[0e/ea02f7] test_combine (1) [100%] 5 of 5 ✔
[[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md, 1]
[[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md, 2]
[[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md, 3]
[[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md, 4]
[[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md, 5]
[[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/0e/ea02f7cd91bec57d30b752a1351a8b/README_1.md]
[[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/ba/cf7a5896b8bf20ddf89ab0d12cf56d/README_5.md]
[[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/20/b54fd355a8820feb8b6cd753c7e2e6/README_2.md]
[[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/3a/08084544a28f812bdeade76d3edaf0/README_3.md]
[[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/8f/8f0ea19e6f4ce346fc6c5f4f40cda4/README_4.md]

I would have expected this output for the test_combine.out.md_number.view() channel, where number:# matches the number README_#.md:

[[id:README, number:1], /Users/olgabot/code/nextflow-each-metadata-error/work/0e/ea02f7cd91bec57d30b752a1351a8b/README_1.md]
[[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/ba/cf7a5896b8bf20ddf89ab0d12cf56d/README_5.md]
[[id:README, number:2], /Users/olgabot/code/nextflow-each-metadata-error/work/20/b54fd355a8820feb8b6cd753c7e2e6/README_2.md]
[[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/3a/08084544a28f812bdeade76d3edaf0/README_3.md]
[[id:README, number:4], /Users/olgabot/code/nextflow-each-metadata-error/work/8f/8f0ea19e6f4ce346fc6c5f4f40cda4/README_4.md]

All code and output is in the same repo as before, https://github.com/olgabot/nextflow-each-metadata-error

Is the pattern of modifying the metadata in the process script not recommended? I find it strange that the context of number is shared across all inputs, even though the output of README_1.md, README_2.md, README_3.md, README_4.md, and README_5.md all get created properly.

olgabot commented 3 months ago

Hello, I think I understand the issue now ... anything in script without def is a global variable, not local. The def assignment makes it local. So meta.number = number is a GLOBAL assignment. I have been confused about what def means for a long time now, and now I finally get it! From this tutorial: https://training.nextflow.io/advanced/metadata/#first-pass

I ended up doing something like this. Here's the script:


process test_each {

    input:
    tuple val(meta), path(filename)
    each number

    output:
    tuple val(meta), path("*_*.md"), emit: md_number

    script:
    """
    touch ${filename.simpleName}_${number}.md
    """
}

workflow {
    // input_file = Channel.fromPath("README.md")
    input_ch = Channel.of([[id:"README"], file("README.md")])
    numbers = [1,2,3,4,5]

    input_ch.view{"input_ch: ${it}"}

    test_each(input_ch, numbers)

    test_each.out.md_number.view{ "test_each.out.md_number: ${it}" }

    meta_number = test_each.out.md_number
        .map { meta, path ->
            tokens = path.getSimpleName().split("_")
            number = tokens[1].toInteger()
            meta.number = number
            [meta, path]
        }
    meta_number.view { "meta_number: ${it}" }

}

And here is the output, with metadata matching the filename, yay!

(nf-core)
 ✘  Wed  5 Jun - 08:02  ~/code/nextflow-each-metadata-error   origin ☊ main 13☀ 11● 
 @olgabot  nextflow run test_each_meta_working.nf

 N E X T F L O W   ~  version 24.04.2

Launching `test_each_meta_working.nf` [friendly_snyder] DSL2 - revision: b21577eb59

executor >  local (5)
[75/bd6873] test_each (5) [100%] 5 of 5 ✔
input_ch: [[id:README], /Users/olgabot/code/nextflow-each-metadata-error/README.md]
test_each.out.md_number: [[id:README], /Users/olgabot/code/nextflow-each-metadata-error/work/41/edc1f3323f017962ced3f278fb3f5e/README_1.md]
test_each.out.md_number: [[id:README], /Users/olgabot/code/nextflow-each-metadata-error/work/36/c083fef44bd13e2801b6f99efd8e15/README_2.md]
test_each.out.md_number: [[id:README, number:1], /Users/olgabot/code/nextflow-each-metadata-error/work/75/bd687358e0578bd60a54afcc457a7a/README_5.md]
test_each.out.md_number: [[id:README, number:1], /Users/olgabot/code/nextflow-each-metadata-error/work/21/727326096a83218fd9ff4270d8c399/README_4.md]
meta_number: [[id:README, number:1], /Users/olgabot/code/nextflow-each-metadata-error/work/41/edc1f3323f017962ced3f278fb3f5e/README_1.md]
test_each.out.md_number: [[id:README, number:2], /Users/olgabot/code/nextflow-each-metadata-error/work/b5/01dc6da58320fb2ea7c98e3f385194/README_3.md]
meta_number: [[id:README, number:2], /Users/olgabot/code/nextflow-each-metadata-error/work/36/c083fef44bd13e2801b6f99efd8e15/README_2.md]
meta_number: [[id:README, number:5], /Users/olgabot/code/nextflow-each-metadata-error/work/75/bd687358e0578bd60a54afcc457a7a/README_5.md]
meta_number: [[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/21/727326096a83218fd9ff4270d8c399/README_4.md]
meta_number: [[id:README, number:3], /Users/olgabot/code/nextflow-each-metadata-error/work/b5/01dc6da58320fb2ea7c98e3f385194/README_3.md]