Closed cjw85 closed 1 year ago
Unfortunately, this cannot be done, because this setting is defined at the level of the transfer manager for all uploads.
Not aware of a way to define it independently for each transfer.
However having an early error, checking it ahead of the transfer, it looks like a good suggestion
😬 Wow, thats bad!
So we have to simply set the chunk size to the maximum the maximum filesize we might expect divided by 10,000? The new default of 100MB means we can transfer almost a 1TB, but its at the expense of no multipart uploads for smaller files.
That's how AWS SDK works
I see there are two versions of S3FileSystemProvider
, one in the nf-amazon
plugin and another one in the nextflow-s3fs
repo. But as far as I can tell the second one is being used. Paolo can you explain how this works? Want to make sure I understand correctly before I go make a PR on the other repo.
Also, I'm wondering if this warning will also catch directory uploads, i.e. does the client also use this method to upload each file in a directory.
The https://github.com/nextflow-io/nextflow-s3fs has been incorporathe in the nf-amazon plugin as of 22.x series. Therefore the latter is the one to be modified.
Directory upload is essentially a directory listing, followed by each file upload.
Okay I understand. It looks like S3MultipartOptions
already performs this check while computing the chunk size:
Seems kind of pointless to make this adjustment if it's going to fail anyway. I think we should move this check to the code that actually performs the multipart copy.
Not sure this is still relevant. The upload in recent version is managed by this
and the chunk size is declared here
Do you not still have the issue that the Transfer manager is being constructed before examining files to know what an appropriate chunk size to use should be?
I think I was looking at the code for multipart copy, which actually doesn't seem to have any problem because it just increases the chunk size if it's too small. I wonder if the transfer manager is doing the same thing under the hood for multipart upload.
@cjw85 I think if you try the latest edge release you won't see this problem anymore.
Ooh thanks. Yeah I was using an older version originally. It was that Paulo made a comment that seemed to suggest the chunk size was fixed. When I scanned the current repo I did see code which seemed to calculate the chunk size based on the filesize so I was a little confused.
This has been improved in version 22.10.x
Multipart uploads may have up to 10000 parts: https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html
The code here: S3FileSystemProvider.java#L549-L576 does not appear to check that the number of parts will exceed this limit given the known object size and chunk size.
Either the method should immediately throw an exception, or choose a suitable chunk size such that the copy can proceed without error.