kahing / goofys

a high-performance, POSIX-ish Amazon S3 file system written in Go
Apache License 2.0
5.21k stars 522 forks source link

Mount 'all' s3 buckets #261

Open tam203 opened 6 years ago

tam203 commented 6 years ago

Help wanted:

I've been experimenting developing a FUSE system that mounts 'all' of s3 (that the mounting user has access to) and determines which bucket and item you want based on path name. i.e. "///". A number of compromises have to be made, such as not being able to ls on the mount point as in theory you can access every single public bucket.

The performance is about 25% or less of goofys so I'm interested in either forking goofys or getting this feature added to goofys.

Is there any fundamental reasons that goofys couldn't do this and where would I start? Thanks for the help.

gaul commented 6 years ago

This seems confusing since operations between buckets will have different semantics than within a bucket, e.g., client-side vs. server-side copies. However I can see users with many buckets wanting to share a goofys instance to control a its memory and connection use.

tam203 commented 6 years ago

That's a good point. I guess it's not been so relevant to us because we deliberately want to mount as read-only. So a lot of that complexity goes away.

tam203 commented 6 years ago

To explain further we are prototyping platforms for scientific research. We want to make an arbitrary range of data sets available through S3 to this platform. We are running on kubernetes and running a daemon set (so available on all nodes) that runs a FUSE service that can access (read only) any S3 resource that we allow (including public). This is then volume mounted at '/s3' into individual users containers where they can access S3 as POSIX (the only method supported by many tools).

I'm impressed by the speed of goofys and would like to see if I can emulate that but in this different use case.

kahing commented 6 years ago

I am open to such a change but only if it's implemented for both read and write. Probably can make a special bucket name * trigger this mode and in goofys.go fuse operations check this mode and parent inode == root and do your special ops there.

On rename check that both paths are in the same bucket, otherwise just fail them with ENOTSUPP. Make mkdir create a new bucket (take region from aws config), and fail files creates at top level.

Public buckets that you don't own will be a little weird since they won't show up in ls but will appear when you access them but I don't think that's a big deal

tam203 commented 6 years ago

That sounds good and would suit my needs nicely. I also agree that it's not a big deal about odd behaviour with ls and maybe other tools. In out case at least we don't want to pretend this isn't S3 we just need to provide a compatible api to it for some of our tools.

kahing commented 6 years ago

I take it that you are volunteering to add this?

tam203 commented 6 years ago

Umm, so I've never worked in Go before and whilst I'd love to take this as an opportunity to learn I'm not likely to do this in the immediate future. Might well get a chance later in the year but until then it will sit around. Happy if you want to close or to work on yourself. If it's not implemented at a point when I've an opportunity I'll take a look an raise a pull request.

Ark-kun commented 1 year ago

gcsfuse seems to have partial support for this. It can virtually mount all buckets to the /gcs/ directory such that cat /gcs/my_bucket/path/file.txt works. AFAIK, they have no support for creating/deleting/listing buckets that way though.