Allows input_path, for the import workflow, to be an S3 URI.
Adds some checks on the URIs to make sure the correct files are available.
Adds some logic to make sure Nextflow will normally be able to access the appropriate credentials to pull from S3.
The result is that the Nextflow procedure to import a curated dataset into the database pretty much works as-is when the source files come from a folder in an S3 bucket, rather than from a local directory.
Note that Nextflow does support S3 "out of the box", but a subtlety is that it does not support the session-specific tokens that we have been using in the shell environment. Instead, in the session-specific case one must use the credentials from a user profile, and one must not allow Nextflow to try guessing credentials environment variables, because it will fail with a misleading error.
There is also a slight adjustment to the metadata field inference, to allow some channels to be indicated as "computationally generated" without affecting the primary channel annotation.
input_path
, for the import workflow, to be an S3 URI.The result is that the Nextflow procedure to import a curated dataset into the database pretty much works as-is when the source files come from a folder in an S3 bucket, rather than from a local directory.
Note that Nextflow does support S3 "out of the box", but a subtlety is that it does not support the session-specific tokens that we have been using in the shell environment. Instead, in the session-specific case one must use the credentials from a user profile, and one must not allow Nextflow to try guessing credentials environment variables, because it will fail with a misleading error.