nextflow-io / nf-schema

Functionality for working with pipeline and sample sheet schema files in Nextflow pipelines
https://nextflow-io.github.io/nf-schema/
Apache License 2.0
12 stars 21 forks source link

Using samplesheetToList twice gives corrupted output channels #55

Closed SPPearce closed 3 weeks ago

SPPearce commented 2 months ago

Including samplesheetToList twice gives inconsistent outputs, with each of the output channels being mixtures of the two expected outputs.

I made a MRE at with a process that makes some CSV file and then puts both through different schemas:

include { samplesheetToList } from 'plugin/nf-schema'

workflow {

    MAKECSVS()

    MAKECSVS.out.one
    .flatMap {one -> samplesheetToList(one, "schema_one.json") }
    .view{one -> "one: $one"}
    .set { ch_one }

    MAKECSVS.out.two
    .flatMap {it -> samplesheetToList(it, "schema_two.json") }
    .view{it -> "two: $it"}
    .set { ch_two }

}

process MAKECSVS {
  memory = '1.G'
  cpus = 1

  input:

  output:
  path('one.csv'), emit: one
  path('two.csv'), emit: two

  script:
  """
  echo "id,foo,bar,string,num" > one.csv
  echo "A,a,1,string1,3" >> one.csv
  echo "B,a,2,string2,3" >> one.csv
  echo "C,a,3,string1,3" >> one.csv
  echo "D,b,4,string2,2" >> one.csv
  echo "E,b,5,string1,2" >> one.csv
  echo "F,c,6,string2,2" >> one.csv

  echo "foo,path"   > two.csv
  echo "a,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/generic/csv/test.csv" >> two.csv
  echo "b,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/generic/tsv/expression.tsv" >> two.csv  
  echo "c,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/generic/tsv/network.tsv" >> two.csv  
  """
}

Running with nextflow run SPPearce/nf-schema-mre -r main gives:

two: [[id:A, foo:a, bar:1, string:string1, num:3]]
one: [[id:A, foo:a, bar:1, string:string1, num:3]]
two: [[num:3]]
one: [[id:B, foo:a, bar:2, string:string2, num:3]]
two: [[num:3]]
one: [[id:C, foo:a, bar:3, string:string1, num:3]]
one: [[id:D, foo:b, bar:4, string:string2, num:2]]
two: [[bar:5, id:E, string:string2, foo:b, num:2]]
one: [[string:string1, num:2]]
one: [[id:F, foo:b, bar:6, string:string2, num:2, num:2]]
two: [[id:F, foo:b, bar:6, string:string2, num:2, num:2]]
two: [[id:F, foo:c, bar:6, string:string2, num:2]]

or:

two: []
one: [[foo:a, foo:a, path:/nf-core/test-datasets/modules/data/generic/csv/test.csv]]
one: [[path:/nf-core/test-datasets/modules/data/generic/tsv/expression.tsv]]
two: [[foo:b, path:/nf-core/test-datasets/modules/data/generic/tsv/expression.tsv]]
one: [[foo:c, path:/nf-core/test-datasets/modules/data/generic/tsv/network.tsv]]
two: [[foo:c, path:/nf-core/test-datasets/modules/data/generic/tsv/network.tsv]]

or:

two: [[id:A, foo:a, bar:1, string:string1, num:3]]
one: [[id:A, foo:a, bar:1, string:string1, num:3]]
two: []
one: [[id:B, foo:a, bar:2, string:string2, num:3]]
one: []
two: [[id:C, foo:a, bar:3, string:string1, num:3]]
one: []
two: [[id:D, foo:b, bar:4, string:string2, num:2]]
one: []
two: [[id:E, foo:b, bar:5, string:string1, num:2]]
one: [[id:F, foo:c, bar:6, string:string2, num:2]]
two: [[id:F, foo:c, bar:6, string:string2, num:2]]
nvnieuwk commented 2 months ago

Thanks for reporting this! I'll take a look at it when I have some time

nvnieuwk commented 4 weeks ago

this has been fixed in #65

nvnieuwk commented 4 weeks ago

I've noticed the test for this tends to fail sometimes. I'm reopening this until I figured out why

nvnieuwk commented 3 weeks ago

70 should fix this once and for all!