Closed stevekm closed 8 months ago
I think the original creator of nf-prov tried to associate published outputs with the process emit
, but maybe they never got it to work. As long as a file is emitted by any process output channel, it can be published, but it could be emitted by multiple process outputs.
But the problem with your request is that published outputs are not related to workflow emits at all. More fundamentally, I'm not sure that the provenance manifest is the best way to facilitate the chaining of pipelines.
I think we need some kind of workflow output schema which can be easily matched to the input schema of a downstream workflow, which does not involve workflow emits at all.
Alternatively, you could write a "meta-pipeline" which imports entire pipelines as modules and chains them together with regular dataflow logic. That would use the workflow takes/emits but not the input/output schemas, which in this case would be an unnecessary extra step. I am working on a proof-of-concept for this using fetchngs+rnaseq, hope to finish it at the hackathon next week.
Alternatively, you could write a "meta-pipeline" which imports entire pipelines as modules and chains them together with regular dataflow logic.
This should definitely be a thing. The main blockers on this (in nf-core at least) have been config-based, and @drpatelh 's related plans should help.
Honestly, I am not really a big fan of the idea of writing "meta-pipelines" because then it seems you would have to write one for every combination of pipelines you want to chain together.
I feel like this is the better approach;
I think we need some kind of workflow output schema which can be easily matched to the input schema of a downstream workflow
( which feels related to this https://github.com/nextflow-io/nextflow/issues/4670 )
an idea floated elsewhere, was some mechanism by which you could chain pipelines in a manner like this
nextflow run main1.nf -output-schema-stdout ... | nextflow run main2.nf -input-schema-stdin
The topic of 'pipeline chaining' per se is likely out of scope for this Issue and Repo, maybe it can be moved to some other location. But if "named outputs" were available in the nf-prov (or elsewhere??) then at least we could more easily hack it together ourselves :)
feel free to close this issue if think there's a better place for the discussions, thanks
I see you have commented on https://github.com/nextflow-io/nextflow/issues/4670, let's move the discussion over there. Your feedback might help us finalize the design of the workflow output schema which should be the easiest way to chain pipelines
Right now the manifest JSON output looks something like this
However I am able to define my pipeline's main
workflow
section to have named outputs, like thisIt would be really helpful if we could somehow keep the label such as
myfiles
associated with the published files, maybe something like thisThis would be really helpful for downstream processing, so that you could parse the manifest JSON and identify specific files. For example, if you had an
emit
channel for MultiQC filesmultiqc_ch
, you would be able to identify all the files with the labelmultiqc_ch
to more easily pass them in to some other process, like a chained post-processing workflow.@pinin4fjords
I noticed that under the
tasks
section of the manifest JSON, there is anemit
field already in theoutputs
list for each task, however in all my pipelines so far it seems like the value here isnull
, not sure what this was meant to be used for but it seems like maybe this functionality might overlap?