Open adriannavarrobetrian opened 5 months ago
I'm not understanding what's supposed to be prefix
in your example.
I've used this process definition
process example {
publishDir "s3://nextflow-ci/buy-why-nextflow"
input:
val sample
output:
path "*fastq.gz"
script:
"""
touch ${sample}.fastq.gz
"""
}
I'm getting this result that's perfectly fine
2024-06-18 18:16:52 0
2024-06-18 18:16:52 0 SAMP1.fastq.gz
2024-06-18 18:16:52 0 SAMP2.fastq.gz
Sorry, I copied the example wrong. It's a folder, I updated it to test.
It's essentially the same, I don't see why it should not work
@pditommaso I think the problem reported is the top line of your output:
2024-06-18 18:16:52 0
That's a zero-sized object. OP is asking if this can not be created.
Fascinating, I see it now
I think it happens because Nextflow proactively creates the base publish directory before publishing files. I assumed that the S3 filesystem would map mkdir to a no-op but apparently it is creating an empty "prefix" object
Yep realised the same. something similar is made on amazon creating a dot (hidden) file.
I think it happens because Nextflow proactively creates the base publish directory before publishing files.
Is it possible / advisable to simply skip this step for AWS s3?
We could do that, or we could make createDirectory()
a no-op for the S3 filesystem: https://github.com/nextflow-io/nextflow/blob/12b027ee7e70d65bdee912856478894af4602170/plugins/nf-amazon/src/main/nextflow/cloud/aws/nio/S3FileSystemProvider.java#L468-L489
I'm not sure why you would ever need it... but something tells me that someone's pipeline will break if we remove it 😅
Bug report
When using the publishDir directive to send outputs to an s3 location, it looks like Nextflow creates a zero-sized object with a key ending in a slash at the publishDir location. While technically allowed by S3, this creates issues when performing operations on the resulting publishDir location (like recursing over objects or counting the number of objects under a prefix). It will also keep empty prefixes around; the objects themselves cannot be seen in the console, and if you try to copy the prefix using the AWS CLI naively, it fails.
Steps to reproduce the problem
Minimal example to replicate:
Program output
Environment