Open thoutenbos opened 7 years ago
What S3 software are you using? rename is currently using 5GB parts but it really should use a more optimal part size for concurrency. I would rather not have a fixed MPU part size in general.
That would be even better indeed :)
We are using Cloudian S3, turns out the default max MPU part is 5GB there instead of the 5GiB limit at AWS. I was indeed running into very strange OS errors on rename which seem to be caused by this difference.
is this some setting you can change on the cloudian side?
Yeah turns out it's configurable so we will change it there 👍
Would still be nice to have some optimisation / configuration for this in here as well. Anyway, thanks for the nice software and this feature request can be closed if you want.
agreed that Cloudian's max MPU size should be the same as AWS S3 and that goofys shouldn't use 5GB as part size unless it's necessary. Will leave this open until the latter is fixed.
It seems that for new puts, the partSize being used varies from 25MB => 125MB (it scales up based on how many parts have been uploaded so far), seen here: https://github.com/kahing/goofys/blob/master/internal/handles.go#L495-L503
However, having a partsize of 125MB (max) means that the largest file which can be put is ~1.25TB , due to S3 (and goofys') multipart limit of 10,000 parts.
Being able to configure a larger partSize when you know that the files will be >1TB would be a use case. I know of some backup/archive applications which write extremely large files.
Definitely true about the 1TB or so limit and that some backup applications use extremely large archives (although the ones I've interacted with ex BackupExec, Netbackup, etc, all have configurable limits). Do you or do you know anyone who's using goofys which applications that require > 1TB files?
Another one is that some non-Amazon S3 hosts have different part count limites (1000 in the case of mine). Meaning that in practice I can't store files with size larger than 5GB, which is definitely not a lot
Which s3 is that? Is it possible to identify it through headers?
In this case, it is Scaleway (cf https://www.scaleway.com/en/docs/object-storage-feature/#-Object-Storage-Limitation )
Not sure about headers, but maybe the "max MPU count" could be parameterizable and the size adjusted if the object will take more parts than it?
I think something like a /etc/goofys.conf
(and a $HOME-stored per-user one) that supports passing alternate S3 endpoints, cache settings etc would make it overall simpler, because otherwise we'll end with very complex fstab configurations probably
Hi, I have the same problems on Scaleway's Object Storage service, which in fact has a limit of 1000 parts and therefore blocks all files over 5GB via Goofys, as mentioned in the previous comment. Is there a workaround provided via a file configuration for example ? It seems that there are no headers to get the configuration dynamically.
Would it be possible to make the MPU part size configurable? Currently it seems to be set hard-coded at exactly 5GB in multiple places in the code. Our S3 software seems to be having some issues having this exactly on the spec limit.