Currently, when testing pipeline on AWS, only reference building is executed. The issues is that in the execution, the PREPARE_PIPELINE module can't access the s3 bucket files, they are not staged (dowloaded as FASTQ files, not sure why). Here is a link to tower execution.
You can see that as the docs state, the s3 link is converted to /nf-core/test-datasets/marsseq/testdata/SB26/, however, the files are not staged and therefore the pipeline fails.
I've checked that the files are present in the bucket.
I can't test this with internal dangpu cluster because I don't have AWS credentials and whenever I try to execute the pipeline it will timeout, probably because of the missing credentials. I think this could be associated with how the bucket is public/not public, see nextflow/issues/3281.
Possible solution
Check if the files are coming from s3 bucket, if yes, then convert the s3 to normal url format and download the files with WGET module.
Description of the bug
Currently, when testing pipeline on AWS, only reference building is executed. The issues is that in the execution, the
PREPARE_PIPELINE
module can't access the s3 bucket files, they are not staged (dowloaded as FASTQ files, not sure why). Here is a link to tower execution.You can see that as the docs state, the s3 link is converted to
/nf-core/test-datasets/marsseq/testdata/SB26/
, however, the files are not staged and therefore the pipeline fails.I've checked that the files are present in the bucket.
I can't test this with internal dangpu cluster because I don't have AWS credentials and whenever I try to execute the pipeline it will timeout, probably because of the missing credentials. I think this could be associated with how the bucket is public/not public, see nextflow/issues/3281.
Possible solution
Check if the files are coming from s3 bucket, if yes, then convert the s3 to normal url format and download the files with WGET module.
http://test-bucket.s3.amazonaws.com/test-folder/test-file.txt http://ngi-igenomes.s3.amazonaws.com/test-data/marsseq/wells_cells.xlsx
The question still remains, will the reference folder be staged correctly, or will it fail as well?
Command used and terminal output
System information
DanGPU server