nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.61k stars 605 forks source link

`filterMap` operator #4958

Open Xophmeister opened 2 months ago

Xophmeister commented 2 months ago

New feature: filterMap operator

Nextflow currently supports both the filter and map operators against a queue channel. It is very common to combine these, so it may be useful -- as a convenience -- to have a single filterMap operator that does both operations.

Usage scenario

Before:

myChannel
| filter { someCriteria(it) }
| map { someMap(it) }
| // etc...

After:

myChannel
| filterMap { someCriteria(it) ? someMap(it) : null }
| // etc...

The above may not look like much of an improvement, but bear in mind that it's maximally generalised. Maybe an outer join would be a more realistic example:

// Channel of the keys in myChannelA that are not in myChannelB
// NOTE For sake of the example:
// * myChannelA emits [ meta ]
// * myChannelB emits [ meta, etc ] and is *not* empty
myChannelA
| join(myChannelB, remainder: true)
| filterMap { meta, etc -> etc ? null : meta }

Suggest implementation

I've modelled this on Rust's std::iter::Iterator::filter_map, in which its closure returns an Option<T>: if it's Some<T>, then that's the matched and mapped value (of type T); if it's None, then the filter skips. Groovy/Java (AFAIK) doesn't have an equivalent of Option<T>, so I've gone with not-null and null, respectively (presuming it's unrealistic for the mapping function to return null).

Xophmeister commented 2 months ago

Groovy/Java (AFAIK) doesn't have an equivalent of Option<T>

I stand corrected: https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html

bentsherman commented 2 months ago

I was just thinking about this the other day... I think I would support a filterMap operator using the Optional instead of null.

By the way, the branch operator is essentially a multi-filter-map 😆

myChannelA
  | join(myChannelB, remainder: true)
  | branch { meta, etc ->
    some: etc != null
      return meta
  }.some

Clearly the filerMap is less verbose

bentsherman commented 2 months ago

The only thing is, as I've been investigating how to evolve the Nextflow language, I feel that operators are over-used, because they are often needed to fill gaps in the language. So I want to find ways to fill those gaps and make operators less necessary first, before we keep adding convenience operators and potentially further enable bad patterns. For example, I think we can make it so that branch and multiMap are not needed compared to filter and map.

That being said, I've also found filter_map to be useful in Rust, and it seems like a valuable enough convenience even if we manage to simplify the library and usage of operators. I definitely prefer it over branch