viash-io / viash

script + metadata = standalone component
https://viash.io
GNU General Public License v3.0
38 stars 2 forks source link

NextFlowPlatform 2.0 #82

Closed rcannood closed 2 years ago

rcannood commented 3 years ago

I created a repository, viash_nxf_poc, to work out a POC for a NextFlowPlatform rewrite. I'm proposing this rewrite to fix some of my annoyances with the current way of working, but also to reduce the code complexity and add more checks in order to avoid bugs (which currently occur quite regularly).


Channel Interface

A Viash+Nextflow module generated by Viash has the interface:

Input channel:  [id, inputs, ...passthrough...]
Output channel: [id, outputs, ...passthrough...]

These fields are defined as follows:


Module usage

Given a Viash component named poc ( src/poc/config.vsh.yaml ), importing the module yields a Nextflow Workflow which can be used as follows:

nextflow.enable.dsl=2

include { poc } from "./target/nextflow/poc/main.nf" params(params)

workflow {
  Channel.value( [
    "foo", 
    [
      input_one: file("data/pbmc_1k_protein_v3.normalize.output_rna.h5ad"),
      input_multi: file("data/*.h5ad"),
      string: "foo",
      integer: 10
    ]
  ])
  | poc
}

Viash+Nextflow modules are flexible

The strength of the new Viash+Nextflow modules lies in its flexibility in how you want to use the module.

nextflow.enable.dsl=2

include { poc } from "./target/nextflow/poc/main.nf"

workflow {
  Channel.value( [
    "foo", 
    [
      input_one: file("data/pbmc_1k_protein_v3.normalize.output_rna.h5ad"),
      input_multi: file("data/*.h5ad")
    ]
  ])
  | poc.run(
    args: [ string: "foo", integer: 10 ],
    directives: [
      cache: "lenient",
      label: [ "bigmem", "bigcpu" ]
    ],
    auto: [
      simplifyInput: true,
      simplifyOutput: true,
      publish: false,
      transcript: false
    ]
  )
}

Chaining multiple modules

If each module only has one input file and output file:

nextflow.enable.dsl=2

include { poc1 } from "./target/nextflow/poc1/main.nf"
include { poc2 } from "./target/nextflow/poc2/main.nf"
include { poc3 } from "./target/nextflow/poc3/main.nf"

workflow {
  Channel.value( [ "foo", file("data/pbmc_1k_protein_v3.normalize.output_rna.h5ad") ] )
  | poc1
  | poc2
  | poc3

If the modules have multiple input / output files per step:

nextflow.enable.dsl=2

include { poc1 } from "./target/nextflow/poc1/main.nf"
include { poc2 } from "./target/nextflow/poc2/main.nf"
include { poc3 } from "./target/nextflow/poc3/main.nf"

workflow {
  Channel.value( [
    "foo", 
    [
      input_one: file("data/pbmc_1k_protein_v3.normalize.output_rna.h5ad"),
      input_multi: file("data/*.h5ad")
    ]
  ])
  | poc1.run(
    args: [ string: "foo", integer: 10 ]
  )
  | poc2.run(
    renameKeys: ["input_one": "output_one", "input_multi": "output_multi"]
  )
  | poc3.run(
    mapData: { [input_one: it.output_one, input_multi: it.output_multi ] }
  )
}

Reuse same module

You can run the same component multiple times. For reasons, you need to specify a unique key every time the module is used.

nextflow.enable.dsl=2

include { poc } from "./target/nextflow/poc/main.nf"

workflow {
  Channel.value( [ "foo", file("data/pbmc_1k_protein_v3.normalize.output_rna.h5ad") ] )
  | poc.run(key: "step1")
  | poc.run(key: "step2")
  | poc.run(key: "step3")
rcannood commented 2 years ago

Going over all directives to determine how they should be managed.

Layout

This format in Nextflow DSL:

process foo_process {
  <nextflow dsl>
}

is equivalent to the following in the viash config:

platforms:
  - type: nextflow
    directives:
      <viash config>

and is also equivalent to the following in viash + nextflow DSL:

foo_process(
  directives: [
    <viash + nextflow dsl>
  ]
)

Note: Should clojures in viash+nxf dsl be interpreted? E.g. directives: [ "cache": { ... }, "label": "foo" ]?

Order of importance

The order in which directives get resolved (in order of decreasing priority):

  1. values defined in function call (i.e. foo_process(directives: ...)
  2. values defined in viash config (i.e. - { type: nextflow, directives: ... }

accelerator

type code
Nextflow DSL accelerator 4, type: 'nvidia-tesla-k80'
Viash config accelerator: "4, type: 'nvidia-tesla-k80'"
Viash + Nextflow DSL "accelerator": "4, type: 'nvidia-tesla-k80'"

afterScript

type code
Nextflow DSL afterScript "source /foo/bar/script"
Viash config afterScript: "source /foo/bar/script"
Viash + Nextflow DSL "afterScript": "source /foo/bar/script"

beforeScript

type code
Nextflow DSL beforeScript "source /foo/bar/script"
Viash config beforeScript: "source /foo/bar/script"
Viash + Nextflow DSL "beforeScript": "source /foo/bar/script"

cache

type code
Nextflow DSL cache false
Viash config cache: false
Viash + Nextflow DSL "cache": false
-- --
Nextflow DSL cache "deep"
Viash config cache: deep
Viash + Nextflow DSL "cache": "deep"

Possible values: false / true / "deep" / "lenient"

Note that Viash might need to convert yaml booleans into strings during parsing.

conda

Not supported at this stage. Contact maintainers or create a new issue support would ever be needed.

container

Not supported in favour for linking to other viash platforms (e.g. native, docker).

containerOptions

Not supported in favour for linking to other viash platforms (e.g. native, docker).

cpus

type code
Nextflow DSL cpus 8
Viash config cpus: 8
Viash + Nextflow DSL "cpus": 8

clusterOptions

type code
Nextflow DSL clusterOptions xxxx
Viash config clusterOptions: xxxx
Viash + Nextflow DSL "clusterOptions": "xxxx"

disk

type code
Nextflow DSL disk '2 GB'
Viash config disk: "2 GB"
Viash + Nextflow DSL disk: "2 GB"

Must match <decimal> [KMGT]?B

echo

type code
Nextflow DSL echo true
Viash config disk: true
Viash + Nextflow DSL "disk": true

errorStrategy

type code
Nextflow DSL errorStrategy "terminate"
Viash config errorStrategy: terminate
Viash + Nextflow DSL "errorStrategy": "terminate"

Possible values are 'terminate', 'finish', 'ignore', 'retry'

rcannood commented 2 years ago

@tverbeiren Did I forget something? I'm going to use the content of this issue to make a blog post on viash.io.

rcannood commented 2 years ago

This functionality was released in 0.5.11 :partying_face: