nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.74k stars 628 forks source link

Conditionally populate resourceLabels to tag field in nf-amazon plugin's AwsBatchTaskHandler for AWS Batch Template Registration #5413

Open rbartelme-pivot opened 1 week ago

rbartelme-pivot commented 1 week ago

New feature

Proposed change, if resourceLabels are present for a given process. They should populate the AwsBatchTaskHandler's tag's field when Nextflow registers a job definition on AWS Batch.

Usage scenario

Multiple users have asked for this feature to track resource utilization and this can be used by AWS Solutions Architects to manage Nextflow behavior in non-NF Tower systems. This also allows AWS Solutions Architects to track costs, compute utilization, and optimize deployments across AWS Batch's resources like Fargate and EC2.

Suggest implementation

Add a conditional to the nf-amazon plugin's groovy source code where if resourceLabels are present for a given process that is trying to register an AWS Batch template, the resourceLabels contents populate the tags via req.addTagsEntry().

https://github.com/nextflow-io/nextflow/blob/711864fd4fcdbf6839f0370fb609d28a003b6942/plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchTaskHandler.groovy#L657

pditommaso commented 1 week ago

It sounds reasonable

pditommaso commented 1 week ago

Discussing a bit about this, this has not been done because the job definition is created by nextflow only the very first, and then re-used for all following runs using the same container.

Adding the labels, it should be recreated ever time a label value change, that may be expected, it could result in a proliferation of batch job definitions, which may not be desirable in common cases

rbartelme-pivot commented 1 week ago

Discussing a bit about this, this has not been done because the job definition is created by nextflow only the very first, and then re-used for all following runs using the same container.

Adding the labels, it should be recreated ever time a label value change, that may be expected, it could result in a proliferation of batch job definitions, which may not be desirable in common cases

Fair assessment of common use cases.

Would the tags not be able to simply be updated if the AWS IAM role allows for it? Does Nextflow create new template registrations with new Nextflow version releases or does it update the registrations?