simsong / bulk_extractor

This is the development tree. Production downloads are at:
https://github.com/simsong/bulk_extractor/releases
Other
1.04k stars 184 forks source link

Bulk-extractor generates SegFault midway through processing complex directory hierarchy with -R #396

Open zdavatz opened 1 year ago

zdavatz commented 1 year ago

Test case:

./bulk_extractor -o out -R <directory>

output:

zsh: segmentation fault  bulk_extractor
simsong commented 1 year ago

Hi. We are thankful for your bug report.

First, you should know that bulk_extractor has a restart system. You can simply re-run the command (hit up-arrow and return) and the program will continue where it left off, avoiding the data that made it crash.

Second, we are thrilled to get your data. If you don't want to post a link here, please email me at simsong@acm.org.

Third, if you want to try to debug this yourself, we have instructions for how to do that:

https://github.com/simsong/bulk_extractor/tree/main/doc/Diagnostic_Notes

Thanks again!

zdavatz commented 1 year ago

All the report files, that where generated after rerunning the command. report.xml.tar.gz

simsong commented 1 year ago

I've obtained the .tar file of the directory and could process it without error, so the problem appears to be the directory iterator. If I can replicate this, I'll create a test case for the iterator that makes a large number of files in the directory tree and tries to process them.

simsong commented 5 months ago

Okay. I'm turning my attention back to this but find that the I cannot reproduce it. Do you have a dataset I can use?