tw4l / brunnhilde

Siegfried-based characterization tool for directories and disk images
MIT License
82 stars 11 forks source link

Freeze on large collection (>3 million files) with many duplicates, special characters #42

Closed jesswhyte closed 2 years ago

jesswhyte commented 4 years ago

Ran as: ~/.local/bin/brunnhilde.py -n /utlarchive/staging/MC/Collection /utlarchive/staging/MC/ MC-BReports

File count in collection: 3,060,270

Started job on Oct 04. Siegfried.csv & siegfried.sqlite created on Oct 06. Process still going. Memory usage is 0. Siegfried.sqlite is open as read. siegfried.csv is closed. No report csvs have been created. Temp.html is at 0B.

I'm sorry, I don't know what specifically is causing the issue (and I recognize this is an unwieldy collection). Please let me know if there's any info about the collection that would be helpful for you (e.g. copy of sf.csv, sf.sqlite, samples, number of dupes, etc.).

tw4l commented 4 years ago

Thanks for filing the issue, Jess! If you're able to share a copy of the Siegfried CSV and the sqlite file, that would be really helpful in diagnosing the issue.

tw4l commented 2 years ago

The change in #52 should help with collections of very large files. Closing this for now for inactivity.