nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.75k stars 628 forks source link

Support for more than 1 Azure Storage account #4445

Open adamrtalbot opened 1 year ago

adamrtalbot commented 1 year ago

New feature

Azure currently only supports a single storage account. In a few instances, there is a need to work across multiple storage accounts. I've had to do this once, I have been contacted a few times by other users who wish to do this.

Usage scenario

Typically this is for one of a few reasons:

Suggest implementation

The actual how to do this isn't as hard as how to expose this to the pipeline developer and user. Somehow, we have to delineate between different storage accounts that have the same path and name, e.g. az://data/R1.fastq could exist twice. Furthermore, isolating this to a single storage account provides certain benefits (e.g. no cross-region data transfer, simpler access control). Perhaps extending authentication methods such as #3314 will make workarounds easier (e.g. use azcopy to transfer data between accounts within a process).

Consider this more of a discussion on practicality rather than a serious feature.

adamrtalbot commented 1 month ago

See also https://github.com/nextflow-io/nextflow/issues/4683 and https://github.com/nextflow-io/nextflow/pull/4692 which use this solution:

az://storage-account-name.blob-container-name/path

But removes the prefix. Instead, we could do use the storage account name where provided, but if not provided it will use the azure.storage.accountName, practically using that value as a default. e.g.:

azure.storage.accountName = "my-azure-account-1"

az://blob-container would be automatically inferred to be az://my-azure-account-1.blob-container.

az://my-azure-account-2.blob-container would be left as az://my-azure-account-2.blob-container.

i.e. these are two different paths, even though the container name is the same.

Current solution would just have az://blob-container, which is automatically be assumed to be in my-azure-account-1.

pditommaso commented 1 month ago

Another problem is how to provide the credentials for each account? could be an option to rely on Entra for this ?

adamrtalbot commented 1 month ago

Another problem is how to provide the credentials for each account? could be an option to rely on Entra for this ?

Yes, this would require Entra to access two accounts at once.

pditommaso commented 1 month ago

This is would allow to not manage a separate credentials for each account. That's good