simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.09k stars 187 forks source link

Feature request: add regex filter for feature files. #473

Open Bb4fit opened 4 months ago

Bb4fit commented 4 months ago

The project needs slight improvement in terms of outputs. It is best to modify the program so that it only saves content that has been successfully extracted, rather than saving empty text files if there is no extracted content. Among the positives of this:

simsong commented 4 months ago

Thanks for your comments.

Your suggestion of adding a regex filter on each feature file to further prune the output is a curious one. This program has been in use for 14 years and no one has ever suggested this before. It is straighforward to run grep on a feature file; it is not straightforward to re-run bulk_extractor if the there is a typo in the filter.

Do you have an actual use case for which the output size is problematic and a filter is required, or is this a request based on what a hypothetical user would like? If you are indeed in need of this feature, you are welcome to submit it as a pull request. I'm happy to design it with you. Adding more command line switches is problematic at this point, so you might also want to add the ability to have a yaml or JSON configuration file.

If you aren't able to implement this yourself but are willing to pay for this feature to be created, I can hook you up with a consultant.