phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Other
6.52k stars 153 forks source link

Searching binary files #70

Open crappycrypto opened 4 years ago

crappycrypto commented 4 years ago

With ripgrep it's easily possible to search binary files (e.g. with rg --binary) However this breaks when using ripgrep-all, as is easily demonstrated by viewing the output of rga-prepoc

dd if=/dev/zero of=test_file bs=1K count=16
echo "Search string" >> test_file
gzip test_file
tar cf test_file_gz.tar test_file.gz

# Both files do not produce any greppable output
rga-preproc test_file.gz
rga-preproc test_file_gz.tar

# Rga works for the gzip file, not for the tar file
rga -z --binary 'Search string' test_file.gz
rga -z --binary 'Search string' test_file_gz.tar

Is there any option to search binary files using ripgrep-all? (Even the output of strings would be useful for binary data)

phiresky commented 4 years ago

The issue is that ripgrep can only do it's binary/non-binary decision based on input streams from single files, but ripgrep-all will produce multiple file streams from a single file (e.g. in zip archives), so it has to do its own binary detection. But i guess this could be solved by also handling the -a / --text and --binary flag in ripgrep-all to turn of its binary detection

crappycrypto commented 4 years ago

For my use-case (grepping strings) it would be okay to add a new adapter for binary files which returns the output of strings. That way you can keep the prefix of the path inside an archive. This does however limit the general features of ripgrep since strings will determine which encodings to search and whether non-string output can be grepped.