phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
8.14k stars 176 forks source link

Feature Request: `--glob` and `--iglob` for filenames within an archive #57

Open nathanpjones opened 4 years ago

nathanpjones commented 4 years ago

Unless I missed something, there doesn't seem to be a way to filter by filename inside the archive. It would be helpful to be able to do so to avoid uncompressing data that will be ignored anyways.

Right now I'm filtering the output to achieve the same goal but it's inefficient. Here's my workaround.

rga "what to find" --iglob "*.zip" --no-heading --color always | rga -i "^[^:]*:[^:]*\.txt:" --color never -
phiresky commented 4 years ago

This definitely makes sense, though it's not trivially easy to define the semantics - Would using the same flag for within zip and outside of zip be enough? basically --glob "*.{zip,txt}". Or would it require specifying different globs within and outside of archives? Then what about archives within archives?

I guess since rg globs are gitignore syntax it would be possible to define the path to match against when within archives as e.g. dir/rootarchive.zip:inarchivepath/archive.zip:hello/foo.txt, then globs could apply to the whole path - but then it's still hard to specify to search for txt files in the archive but not outside of it.

nathanpjones commented 4 years ago

That's a good point. Maybe you could relate it to levels like you do for targeting nested archives. The syntax could be something like, --iglob 1="*.txt" or -g 2="*.xml". Then whatever globs are defined are applied at that nesting level or deeper.