versity / versitygw

versity s3 gateway
https://www.versity.com/products/versitygw/
Apache License 2.0
90 stars 16 forks source link

ListObjectsV2/ListObjects should ignore hidden dot files #139

Open jonaustin09 opened 12 months ago

jonaustin09 commented 12 months ago

Describe the bug ListObjectsV2/ListObjects actions return .DsStore files as s3 objects on MacOs, which is a file created in every directory, containing some metadata about the folder.

To Reproduce aws --endpoint-url http://localhost:7070 s3api list-objects-v2 --bucket my-bucket run in the bucket which contains directory objects.

Expected behavior It has to skip these files as s3 objects.

benmcclelland commented 12 months ago

I think this should be a option for the posix backend to skip "hidden files". We could skip anything that starts with "." like ls would. Or maybe this is default, and the option would be to enable hidden files. There might be cases where we really do want access to all files, even hidden ones.

jonaustin09 commented 11 months ago

I think skipping the hidden files by default or even optionally is not a good idea, as it's a very possible scenario that user will upload some "hidden files" like: .gitignore, .eslintrc, .npmrc ...

scaleoutsean commented 2 months ago

Having the option to ignore files (maybe using a regex setting) is a good idea, IMO. If it's off by default, there's no harm.

it's a very possible scenario that user will upload some "hidden files" like: .gitignore, .eslintrc, .npmrc ...

That's almost like that feature on FTP severs where chmod would be applied on upload, hiding the files (local_umask on vsftpd, for example).

A valid use case is when one mistakenly uploads such files, others still wouldn't able to access them.

If there's no way to disable such files, inadvertent leakage of .env becomes more likely, especially in large buckets where there are many users - some of them may keep such files around thinking they're safe while forgetting S3 gateway allows them to be enumerated (which bad enough).

And here they say ./ has limited support in some applications, so that's one special case where leading . creates problems. I guess many users simply avoid such key names, because that's not the only key pattern that creates problems (the page mentions others) in various applications.

Objects with a prefix of "./" must be uploaded or downloaded with the AWS Command Line Interface (AWS CLI), AWS SDKs, or REST API. You cannot use the Amazon S3 console.

There are valid use cases for leading . (e.g. Git) for sure, but we're talking about a per-bucket option that would be off by default, right?

benmcclelland commented 2 months ago

@scaleoutsean would you suggest that the ignored files are only hidden from listings, or would you expect GETs to fail for these as well?

scaleoutsean commented 2 months ago

I don't have a strong opinion on that, that's why I didn't mention it - I only wanted to say as an option it may be useful to some.

Since you ask: I'd prefer GETs to fail as well, because merely hiding them would make legit users unaware of the risk (as far as not seeing what they share when using ListObjects* to check). And attackers could still GET those objects even without knowing in advance if they really exist, by simply brute force-trying all the lines from .gitignore files obtained from Github (and especially from the repos owned by orgs that use this S3 GW ;-)).

But then such buckets wouldn't work for Git and other apps that depend on successful ListObjects enumeration, so it's a double-edged sword.

If buckets are granular enough, one could selectively enable/disable these as necessary (e.g. disallow both Get & List in one, and disallow List & allow Get in another) but that's probably not necessary right away.

Personally, I'd expect logical & secure defaults and for this optional setting I'd move slowly and disallow everything at first, and if good use cases for selective (Get only, List & Get, or List only) access emerge, consider those separately later.

My main use case is if I share my project directory where POSIX users run jobs (interactive or other), I'd like to allow them to keep existing workflows and rest assured their dot-files won't be accessible over S3. Very simple, very safe, doesn't require anyone to change anything or even be aware of this S3 GW feature. If you have a FS that's used by many users and apps, it's likely someone has a sensitive dot-file somewhere. If they must allow S3 access to such files, create proper policies and procedures and remove the non-default setting.

By the way, related to the reason this issue was created: I don't use OS X so this may be wrong here but if List can't show .DsStore, I wonder what happens when an S3 sync app tries to List to see if .DsStore exists in the sync target - it may try and PUT it every time because it'd always see it as missing in the bucket. If the app could be configured to ignore .DsStore then ignoring it for both List and Get operations may be a good solution for that. Maybe some S3 apps would have problems even if the feature existed - obviously "sync to client" wouldn't work if List was allowed and Get not for dot-files (if the app couldn't be set to ignore Get errors). It appears to me the right place to fix such challenges would be the (client) app, not the server. Another place could be ACL policy on the bucket. I expect Versity S3 GW behaves the same as AWS S3 in this case, so there's nothing that Versity S3 GW should do. In the world of S3 gateways, API consistency is a feature. Having an option that makes it possible to deviate may also be a feature, but there are downsides to that as well. Anyway, this is just an example of how many different problems could happen and I may be wrong in this particular case. But what I am reasonably sure is it's a deep rabbit hole, so I'd prefer an option to simply close it (a security feature to completely ignore all dot-files, because it may not be obvious to think of that when creating ACLs, and what many would like to enable dot-file blocking on generic non- DevOps/GitOps buckets) rather than going down the hole and creating a bunch of different options that may be easily mis-configured, especially if ACLs and many clients already offer similar functionality.

benmcclelland commented 2 months ago

Good points. I like the flexibility of glob or regex here for configurability, but sometimes this can get confusing. For example the object name: dir1/.dir/object would be mapped to that path. The filename itself isn't a dot file, but is contained within a dot directory. So you might need a regex like (^\.)|(/\.) to match any object that starts with a . or has any path component that starts with a .. It might be clearer if there was an option --skip-hidden that would not return any files beginning with . and not traverse any sub-directory beginning with .. We could still have --skip-regex for expert users as well.

I'm assuming --skip-regex would be matched against the full object name (dir1/.dir/object in the above example), and not individual components of the path.