nextflow-io / nf-prov

Apache License 2.0
23 stars 11 forks source link

Biocompute Objects / RO Crates #2

Closed ewels closed 11 months ago

ewels commented 1 year ago

Hi folks!

When talking about workflow provenance, the topics of BioCompute Objects and RO Crates often comes up. These aren't particularly new standards and we've been talking about writing some kind of support for them in Nextflow for years. They come in two flavours - metadata for the workflows themselves that are published in the repo (see this ancient PR) and metadata for analysis runs, describing the data files and where they came from.

The latter of the two topics came up again recently and @bentsherman and I were wondering if a good place to add support for this would be in this plugin. I guess that you're already pulling in most of the information that is needed, so hopefully adding support would just be a case of returning that information according to their specifications. For background, see this tutorial on doing this with Nextflow.

It'd be very cool to be able to add a plugin to a user config to generate these files from Nextflow directly, with no additional dependencies or manual work required.

What do you think?

Phil

ewels commented 1 year ago

See also: https://github.com/nextflow-io/nextflow/issues/1775 (but this feels like it'd work better as a plugin feature rather than core Nextflow, for now at least).

thomasyu888 commented 1 year ago

Hi @ewels ,

Glad to hear this is coming back up again - this is certainly exciting! It definitely makes sense to add to this plugin.

I need to get a better understanding around BioCompute Objects and RO Crates and reorient myself with this plugin, do you have a sense of what needs to be added to support this within this plugin? Unfortunately, Bruno has since left the organization, but it'd be fun to take a look at this.

Tom

bentsherman commented 1 year ago

Hi Tom, we are willing to contribute these changes if that would make things easier for you. I'm also new to BCO and RO crate, but my understanding is that it's basically the same information that nf-prov is collecting, just rendered to a different JSON schema.

Given that Bruno has moved on from Sage, what is the status of this plugin? Is someone from Sage actively maintaining it? If not, we would be happy to take ownership of this plugin, since provenance is a feature of high interest in the community. We can also work with you to make sure that the plugin continues to meet your needs at Sage. Let me know what you think.

thomasyu888 commented 1 year ago

Hi @bentsherman ,

Sorry for the delays. I think it does make sense us to transfer ownership of this plugin due to current bandwidth and groovy expertise. Perhaps when the summer crunch time dies down a bit, we can contribute to the plugin.

It would be great to to continue working together to make sure this plugin meets our provenance / data lineage needs.

bentsherman commented 1 year ago

Sounds great, then feel free to transfer the repo to the nextflow-io org. And we will keep an eye out for your requests and contributions 😄

ewels commented 1 year ago

I'm not sure that @thomasyu888 will have permissions on GitHub to move the repo to nextflow-io. Probably easiest if you can add one of us as an admin on this repo @thomasyu888? Then we can move it 👍🏻

thomasyu888 commented 1 year ago

@ewels , @bentsherman ,

I added you both as admins, feel free to shift!

ewels commented 1 year ago

Many thanks @thomasyu888 - I just transferred the repository. Let's go! 🚀