Closed olgabot closed 5 years ago
Listing the parent folder there are two 2018-06-28
entries .. this looks suspicious:
/tick-genome/dna/2018-06-28
/tick-genome/dna/2018-06-28
/tick-genome/dna/2018-07-26-pacbio
/tick-genome/dna/2018-08-03-pacbio-raw
/tick-genome/dna/2018-08-16-sanger
/tick-genome/dna/2018-10-10-dovetail
/tick-genome/dna/2018-10-11_ise6_asm2.2
/tick-genome/dna/2018-12-03_IscaW1
/tick-genome/dna/2018-12-03_quast
/tick-genome/dna/tick_pacbio_20180813
Ah I think I had accidentally saved an object as /tick-genome/dna/2018-06-28
(no final slash) instead of /tick-genome/dna/2018-06-28/
(with a final slash). Maybe that's causing the error?
Yes, removing the offending object worked!
(base)
Wed 24 Apr - 13:03 ~/code/nf-large-assembly origin ☊ olgabot/wtf-aws ✔ 1☀
aws s3 ls s3://tick-genome/dna/
PRE 2018-06-28/
PRE 2018-07-26-pacbio/
PRE 2018-08-03-pacbio-raw/
PRE 2018-08-16-sanger/
PRE 2018-10-10-dovetail/
PRE 2018-10-11_ise6_asm2.2/
PRE 2018-12-03_IscaW1/
PRE 2018-12-03_quast/
PRE tick_pacbio_20180813/
2018-07-11 13:24:40 79376 2018-06-28
(base)
Wed 24 Apr - 15:07 ~/code/nf-large-assembly origin ☊ olgabot/wtf-aws ✔ 1☀
aws s3 rm --dryrun s3://tick-genome/dna/2018-06-28
(dryrun) delete: s3://tick-genome/dna/2018-06-28
(base)
Wed 24 Apr - 15:07 ~/code/nf-large-assembly origin ☊ olgabot/wtf-aws ✔ 1☀
aws s3 rm s3://tick-genome/dna/2018-06-28
delete: s3://tick-genome/dna/2018-06-28
Now this workflow:
Channel
.fromPath("s3://tick-genome/dna/2018-06-28/*.fastq.gz", type: 'any')
.println()
Produces this output:
Wed 24 Apr - 15:14 ~/code/nf-large-assembly origin ☊ olgabot/wtf-aws ✔ 1☀
make scratch
nextflow run scratch.nf -e.process.executor=local \
-dump-channels \
-profile none \
-e.aws.region=us-west-2
N E X T F L O W ~ version 19.04.0
Launching `scratch.nf` [irreverent_varahamihira] - revision: 883bf431da
/tick-genome/dna/2018-06-28/tick_1_S1_R1_001_first1Mreads.fastq.gz
/tick-genome/dna/2018-06-28/tick_1_S1_R1_post-trimming_first1Mreads.fastq.gz
/tick-genome/dna/2018-06-28/tick_1_S1_R2_001_first1Mreads.fastq.gz
/tick-genome/dna/2018-06-28/tick_1_S1_R2_post-trimming_first1Mreads.fastq.gz
OK. Basically, you had a file with the name of a directory path, right?
Yep, that's correct. But it wasn't exactly the same name as it did not end in a Slash "/"
On Thu, Apr 25, 2019, 07:02 Paolo Di Tommaso notifications@github.com wrote:
OK. Basically, you had a file with the name of a directory path, right?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextflow-io/nextflow/issues/1128#issuecomment-486685641, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGE24HRTYFVA56XT6DH2VTPSG2VNANCNFSM4HIFLSJQ .
OK, closing this and opening a relative issue in the S3 library project https://github.com/nextflow-io/nextflow-s3fs/issues/16.
Bug report
Expected behavior and actual behavior
Hello, I'm trying to list the files in a particular s3 directory, where what I see using Nextflow and the awscli are different. Here is what I see using the awscli:
But running this workflow to recursively list all files in the directory, only lists the parent directory:
Produces only this output, showing only the parent folder /tick-genome/dna/2018-06-28 when it should be showing all files recursively.
Originally, I was trying to get the
*{1,2}_001_first1Mreads.fastq.gz
files but that channel was completely empty, i.e. adding these lines:Produces nearly the same output, though for some reason now complains about fastqc:
Steps to reproduce the problem
Here is a self-contained nextflow script to reproduce the problem
Program output
Environment
Additional context
It seems to be something wrong with this bucket or folder, as I'm able to list objects in other buckets. However, I've made this folder publicly viewable and the bucket policy is quite permissive:
This seems related to this: https://github.com/nextflow-io/nextflow/issues/1121