kahing / goofys

a high-performance, POSIX-ish Amazon S3 file system written in Go
Apache License 2.0
5.18k stars 520 forks source link

Make MPU part size configurable #139

Open thoutenbos opened 7 years ago

thoutenbos commented 7 years ago

Would it be possible to make the MPU part size configurable? Currently it seems to be set hard-coded at exactly 5GB in multiple places in the code. Our S3 software seems to be having some issues having this exactly on the spec limit.

kahing commented 7 years ago

What S3 software are you using? rename is currently using 5GB parts but it really should use a more optimal part size for concurrency. I would rather not have a fixed MPU part size in general.

thoutenbos commented 7 years ago

That would be even better indeed :)

We are using Cloudian S3, turns out the default max MPU part is 5GB there instead of the 5GiB limit at AWS. I was indeed running into very strange OS errors on rename which seem to be caused by this difference.

kahing commented 7 years ago

is this some setting you can change on the cloudian side?

thoutenbos commented 7 years ago

Yeah turns out it's configurable so we will change it there 👍

Would still be nice to have some optimisation / configuration for this in here as well. Anyway, thanks for the nice software and this feature request can be closed if you want.

kahing commented 7 years ago

agreed that Cloudian's max MPU size should be the same as AWS S3 and that goofys shouldn't use 5GB as part size unless it's necessary. Will leave this open until the latter is fixed.

andypern commented 7 years ago

It seems that for new puts, the partSize being used varies from 25MB => 125MB (it scales up based on how many parts have been uploaded so far), seen here: https://github.com/kahing/goofys/blob/master/internal/handles.go#L495-L503

However, having a partsize of 125MB (max) means that the largest file which can be put is ~1.25TB , due to S3 (and goofys') multipart limit of 10,000 parts.

Being able to configure a larger partSize when you know that the files will be >1TB would be a use case. I know of some backup/archive applications which write extremely large files.

kahing commented 7 years ago

Definitely true about the 1TB or so limit and that some backup applications use extremely large archives (although the ones I've interacted with ex BackupExec, Netbackup, etc, all have configurable limits). Do you or do you know anyone who's using goofys which applications that require > 1TB files?

Tristan971 commented 4 years ago

Another one is that some non-Amazon S3 hosts have different part count limites (1000 in the case of mine). Meaning that in practice I can't store files with size larger than 5GB, which is definitely not a lot

kahing commented 4 years ago

Which s3 is that? Is it possible to identify it through headers?

Tristan971 commented 4 years ago

In this case, it is Scaleway (cf https://www.scaleway.com/en/docs/object-storage-feature/#-Object-Storage-Limitation )

Not sure about headers, but maybe the "max MPU count" could be parameterizable and the size adjusted if the object will take more parts than it?

Tristan971 commented 4 years ago

I think something like a /etc/goofys.conf (and a $HOME-stored per-user one) that supports passing alternate S3 endpoints, cache settings etc would make it overall simpler, because otherwise we'll end with very complex fstab configurations probably

ThibaudDemay commented 2 years ago

Hi, I have the same problems on Scaleway's Object Storage service, which in fact has a limit of 1000 parts and therefore blocks all files over 5GB via Goofys, as mentioned in the previous comment. Is there a workaround provided via a file configuration for example ? It seems that there are no headers to get the configuration dynamically.