nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.61k stars 606 forks source link

Extend latest version of nf-ga4gh plugin - most recently published appears to be from Dec. 19 2020 #5007

Closed erasmussenilm closed 2 weeks ago

erasmussenilm commented 1 month ago

Hello,

I am trying to extend the nf-ga4gh TES Executor code and am running into the problem that the code here seems to be the bleeding edge, but it does not seem to be accessible for extension via the io.nextflow/nextflow groupId+artifactId package.

Instead, the 2020 version appear to be the latest available via the io.nextflow/nf-ga4gh package, but appears quite different from the latest version. This includes the 1.3.0 version published 4 days ago, which also appears to be from the 2020 code base.

What would be the way to extend the latest version of the nf-ga4gh plugin?

Thank you in advance,

Erik Rasmussen Illumina Inc.

bentsherman commented 1 month ago

The new version will be released next week.

Are you interested in working on it? We're going to move it into a separate repo soon so that the community can iterate on it more easily

erasmussenilm commented 1 month ago

Hi @bentsherman,

Thanks for getting back to me. Yes, I might very well be.

I am working on customizations of the GA4GH TES interface to allow for some of the directive (in config) k8s metadata currently passed, such as 'pod annotation' and 'container' to be passed-through. For this to be generic though, I would like the guidelines from you on how to best allow that in a generic manner following Seqera Labs guidelines. My thoughts were to add them to the tags, as "flattened" namespaced key-value pairs in the format 'directive..' = ''. So

container 'some-image:some-tag'

would result in a tag

'directive.container' = 'some-image:some-tag

and

pod annotation: 'some-name', value: 'some-value'

would result in a tag

'directive.pod annotation.some-name' = 'some-value'

Thoughts?

However, just to confirm your comment about the new release; does this mean that this new release will have the latest nf-ga4gh plugin as part of the io.nextflow/nextflow package, and extendable as such? Or is it a new release of the io.nextflow/nf-ga4gh package?

Thanks again,

Erik

bentsherman commented 1 month ago

24.04 was just released and includes nf-ga4gh as a core plugin. When we move the plugin into a separate repo, the only difference for you will be that you need to explicitly specify the plugin in your config:

plugins {
  id 'nf-ga4gh'
}

As for the k8s metadata, what if you used the resourceLabels directive? That way you can add whatever you want

erasmussenilm commented 1 month ago

Well, it is entirely possible that I am doing something wrong, however if I include the following in my pom.xml file:

    <dependency>
      <groupId>io.nextflow</groupId>
      <artifactId>nextflow</artifactId>
      <version>24.04.0-edge</version>
    </dependency>

then I don't seem to be able to import any of the plugin classes. There appears no access to the 'nextflow.plugins' packages (although there is a 'nextflow.plugin' package, but that appears to be something different).

How would I import the ga4gh plugin classes from the 24.04 version?

As for the resourceLabels, part of the problem is that we have customers that already are using the 'pod annotation' and 'container' directives so we would need to continue to support that. However, we can certainly map it to a different area in the GA4GH-TES spec than the tags, if that is what you were thinking? Do the resourceLabels already get mapped and passed through the TES interface?

bentsherman commented 1 month ago

The core plugins are still uploaded to separate projects in Maven Central. Here is the one for nf-ga4gh: https://central.sonatype.com/artifact/io.nextflow/nf-ga4gh/overview

Though I don't see a version for 24.04. Also, this repo likely won't be updated anyway once nf-ga4gh is no longer a core plugin.

But why would you need to import the plugin as a dependency?

erasmussenilm commented 1 month ago

The reason I want to import those classes is so that I can choose to only override the (few) functions I need to, in order to support adding the extra metadata. If I cannot do this, I am forced to providing an entire (mostly duplicated) set of plugin classes. And they wouldn't benefit from any further development/improvements done to the nf-ga4gh classes.

Specifically, I want to override:

bentsherman commented 1 month ago

I assume you are developing some larger application which uses Nextflow as a library to launch pipelines? Unfortunately the plugins aren't set up to be used that way. Instead, it would be good to see if we modify the TES plugin directly to support your needs.

Looking at the current code, I thought the resourceLabels directive was supported but actually it isn't. It would be easy to support though, just set the "tags" in the TES task to the resource labels from the task config.

Then you can define the resourceLabels to be whatever you want:

process.resourceLabels = { [
  'container': task.container,,
  'pod-annotation': '...'
] }
erasmussenilm commented 1 month ago

Thanks @bentsherman.

Yes, you are correct. We are developing a generic middle layer (TES) service that sits between the workflow engines (Nextflow being one of them) and the Cloud service. This TES layer supports the GA4GH-TES spec. Using a Nextflow TES executor to hook into this seems like the right choice and as mentioned, the existing one provides most of this already. But we WILL need to support the container+pod annotion directives, without requiring the Nextflow-workflow writer to rewrite their .nf files. So any mapping needed between that and how it gets to the TES REST interface needs to be done by us, not by modifying .nf workflow files.

As for the above code, my apologies for not understanding it fully yet - I am still pretty new to Nextflow syntax (and Java/Groovy for that matter). So let me be more specific with my questions;

  1. Is the above pseudo-code or actual code that I should put in a config file? Somewhere else?
  2. Would the existing nf-ga4gh plugin work as-is with resourceLabels?
  3. Looking at the TES task create spec - where would this metadata show up?
  4. What do you mean by the plugins not being set up to be used like this? In large scale apps? To be extended (via class-inheritance)?
  5. Given our past communication on this, if I still want to create a customized TES plugin, i.e. deriving classes from the existing nf-ga4gh classes - exactly what should I put in the pom.xml? And would I need both the io.nextflow + nextflow AND io.nextflow + nf-ga4gh, or only one of them (in that case, which version)? And any other "plumbing" I need to do, apart from specifying my new plugin to Nextflow?

My apologies for the many questions, I am trying to find the best solution :)

Thanks again!

bentsherman commented 1 month ago

Hold on, if you are developing a custom TES backend then you should not concern yourself at all with the nf-ga4gh plugin. The point of TES is to abstract the execution backend from the workflow manager. So you should only use information provided through the TES task.

We can try to make Nextflow's TES executor send more information in the TES task, but ultimately you need to make sure that you can work within the constraints of the TES API itself.

I would study the TES task payload in more detail and see which properties you can already retrieve. For example I believe there is already a field for the container. Anything else, Nextflow should be able to provide by passing the resourceLabels to the tags property of the TES task. We can update Nextflow to do that.

The example code I gave is Nextflow configuration, which can be specified in the Nextflow config file without modifying the pipeline. You can launch a pipeline and insert whatever config you want at runtime.

erasmussenilm commented 1 month ago

I agree and I already did study the payload from the TES executor and noticed that the config properties did not appear to come through, hence the intended enhancements.

Having resourceLabels being passed in the tags as a generic solution would certainly be a great addition. Especially if it is as easy as you indicate to map properties to the resourceLabels.

What do you think is the typical turn-around time for such enhancements to make it to a released version?

bentsherman commented 1 month ago

Once we spin off the plugin, we should be able to move pretty quickly. I suspect we will move the plugin in the coming weeks

bentsherman commented 2 weeks ago

The nf-ga4gh plugin has been updated as of 24.04. I have also created an issue for your resourceLabels request: https://github.com/nextflow-io/nf-ga4gh/issues/1

The plugin source code is now maintained in that repo

erasmussenilm commented 2 weeks ago

Great, thanks @bentsherman!