peak / s5cmd

Parallel S3 and local filesystem execution tool.
MIT License
2.68k stars 237 forks source link

Feature request for --include and --exclude filters #266

Closed echan5 closed 3 years ago

echan5 commented 3 years ago

Hi, this is a feature request. awscli has --include and --exclude filters and it would be great if s5cmd also had those options!

seruman commented 3 years ago

Hi @echan5. It would be great for us if you and other requester could share what kind of use cases you have in mind? That would help us to better grasp it and decide how to put it with the current feature set (e.g wildcard support).

echan5 commented 3 years ago

@seruman - essentially, feature parity with awcli would be great! https://docs.aws.amazon.com/cli/latest/reference/s3/#use-of-exclude-and-include-filters Examples include using --include "*.txt" to include/exclude certain file types, or --exclude "abc*" to include/exclude any file that starts with prefix "abc". Thank you for looking into this request!

nelhage commented 3 years ago

I'd also have a use case for this; I tried to work around using cp commands in s5cmd run mode, but have been blocked by #301 from actually deploying it in our environment.

In our particular case, we're storing ML models, and we store parameters, and some optimizer states (ADAM moments) alongside the parameters. When we're restoring a model to do inference, we would like to be able to s5cmd cp the relevant parameters to local disk, but ignore the optimizer moments, which are stored with consistent filenames. In our case, that looks something like --exclude=*_moment.dat.

seruman commented 3 years ago

Thank you both @echan5, @nelhage, what's your thoughts on adding include/exclude options those support regex patterns?

nelhage commented 3 years ago

I'd be fine with regexes for my use case. I would find it a bit surprising -- include/exclude in every other command I can think of offhand uses some form of glob syntax -- but I could live with it and it would potentially be a nice bit of flexibility.

echan5 commented 3 years ago

It would be the same for me - I don't think I'd need regex patterns for my current use case, but it could be nice to have that flexibility

seruman commented 3 years ago

I would find it a bit surprising -- include/exclude in every other command I can think of offhand uses some form of glob syntax --

Given s5cmd already supports glob syntax, regex would have been weird as you mentioned. --exclude option is implemented thanks to the @ocakhasan, could you give it a try from master branch?