Closed drpatelh closed 1 year ago
I mid-term solution might be to improve copyTo
to detect bucket regions and in cases where regions don't match, download and then re-upload files.
The copyTo
method calls down to FileHelper::copyPath
and then (on S3, at least) to S3FileSystemProvider::copy
.
To the copy method linked above, I was hoping that we could add to something like
AmazonS3Client client = s3Source.getFileSystem().getClient();
String sourceBucketLocation = client.getBucketLocation(s3Source.getBucket());
String targetBucketLocation = client.getBucketLocation(s3Target.getBucket());
if(sourceBucketLocation != targetBucketLocation) {
Path staged = <some temporary destination in work/stage>
download(source, staged)
upload(staged, target)
} else {
<the other tests already in the copy method>
}
My only hesitation is that the AWS documentation about the GetBucketLocation method suggests that to use the API, "you must be the bucket owner". I'm unsure why knowing the bucket location should be a secrete available only to the bucket owner though.
We can verify this restriction with some testing of iGenomes data though.
Not sure it is worth all of this hassle to get a single file from an AWS bucket? Just means we can keep things simple and less prone to breakage in the pipeline trying to accommodate downloading the file. Docs is definitely the way ahead and we can extend on the commands I posted in the first comment and add directly to the website in the section below: https://github.com/nf-core/nf-co.re/blob/master/markdown/usage/reference_genomes.md#illumina-igenomes
This would allow us to have a central location that we can point to without needing to add to every pipeline.
Docs added to main website here and available below: https://nf-co.re/docs/usage/reference_genomes
Description of the bug
We need to remove / refactor this call because it raises an error when accessing buckets across regions with AWS https://github.com/nf-core/rnaseq/blob/e049f51f0214b2aef7624b9dd496a404a7c34d14/workflows/rnaseq.nf#L59-L64
Maybe we document this instead and provide a command that can be used to fetch the README if required via the
aws cli
Command used and terminal output
No response
Relevant files
No response
System information
No response