nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.77k stars 630 forks source link

Feature proposal: Join operator on associative arrays #5477

Open dquintanatorres opened 1 week ago

dquintanatorres commented 1 week ago

New feature

Hello! Recently, I encountered a need to join two associative arrays within a workflow. While I have been proposed a workaround to manage this, I believe an operator for directly joining associative arrays would be highly convenient. Ideally, this operator would allow specifying a shared key or key pair, similar to SQL-style joins. Extending this to support various join types (inner, outer, left, right, etc.) and scenarios would be extremely valuable.

Usage scenario

Here’s an example use case:

workflow {    

    ch1 = Channel.of( [id: "a", foo: 1], [id: "b", foo: 2] )
    ch2 = Channel.of( [id: "b", bar: 4], [id: "a", bar: 3] )

    ch1
        .join(ch2, by: id, type: 'inner')
        .view()

}

// Output:

// [id: "a", foo: 1, bar: 3]
// [id: "b", foo: 2, bar: 4] 
bentsherman commented 5 days ago

A better join operator is in the works. Not sure when it will be introduced, but I intend to make it do a proper SQL join and work with any data type rather than just lists.