nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.63k stars 607 forks source link

Migrate to AWS Java SDK v2 #4741

Open bentsherman opened 5 months ago

bentsherman commented 5 months ago

Nextflow currently uses the AWS Java SDK v1 which is reaching end of life.

Additionally, new features are only being added to SDK v2, which will make it difficult to adopt new AWS features in the future. We found a way to support SSO authentication with some adaptor class, but other changes might not be so feasible.

The main components we use are AWS Batch, S3, and of course credentials. I don't believe the AWS Batch piece has changed much, but according to Paolo, the file transfer API is very different, and our S3 filesystem is easily the most complex piece of our AWS integration.

Another major change is that the v2 client can only work with a single region whereas the v1 client is cross-region. Supposedly this should not be challenging to implement anymore.

bentsherman commented 5 months ago

Since the S3 filesystem will need to be rewritten, it will also be a good opportunity to improve the performance (i.e. throughput).

It looks like AWS has developed their own S3 filesystem: https://github.com/awslabs/aws-java-nio-spi-for-s3

So we might be able to just use it. We would likely still need to wrap it in our own "delegating" filesystem so that we can add custom behavior (see the S3Path class for details). I have done a similar thing in #4729 for the GCS filesystem.

bentsherman commented 3 months ago

Another ticket came up, SES v1 limits emails to 10MB whereas SES v2 limit is 40MB.

tamuanand commented 3 months ago

Would be great to switch over to SES v2 as early as feasible from the NF development side

tamuanand commented 2 months ago

Hi @bentsherman - wondering if there are any updates here. Thanks in advance.