viash-io / viash

script + metadata = standalone component
https://viash.io
GNU General Public License v3.0
36 stars 2 forks source link

[FEATURE] Allow defining a `tag` based on the state going through the channel. #693

Open rcannood opened 2 months ago

rcannood commented 2 months ago

Feature summary

Allow defining a tag based on the state going through the channel.

Feature description

Right now, the trace.txt uses the tag which is simply set to $id to figure out which process is linked to which component. In OpenProblems, we use it to link the output of the Nextflow trace (containing resource usage throughout the run) to the different datasets going through the pipeline.


It would be useful if we could construct the tag based on information in the state, and if the tag was machine interpretable.

Example: {"dataset_id": ..., "normalization_id": ..., ...}.

However, this is currently not possible because the tag can only be defined using information available inside the underlying nextflow process; that is:

"""nextflow.enable.dsl=2
  |
  |process $procKey {$drctvStrs
  |input:
  |  tuple val(id)$inputPaths, val(args), path(resourcesDir, stageAs: ".viash_meta_resources")
  |output:
  |  tuple val("\$id")$outputPaths, optional: true

For a component with input files --input_mod1 and --input_mod2, this would look something like this:

input: tuple val(id), path(viash_par_input_mod1), path(viash_par_mod2), val(args), path(resourcesDir)

Fundamentally, it would probably be a good idea to implement this at a Viash level, so we can do something like:

| component.run(
  auto: [
    tag: { id, state ->
      toJsonBlob([dataset_id: state.dataset_id])
    }
  ]
)

Why is this feature beneficial?

Allows linking the trace.txt more easily to whatever it needs to be linked to.

Alternatives considered

As a workaround, we could add a --tag argument in components which we could then use to set the tag to tag: "${args.tag ?: id}". See openproblems-bio/openproblems-v2#446.

Possible solution

No response

Confirmation