tw4l / brunnhilde

Siegfried-based characterization tool for directories and disk images
MIT License
82 stars 11 forks source link

_csv.Error: field larger than field limit (131072) #35

Closed MSavels closed 6 years ago

MSavels commented 6 years ago

brunnhilde_error Hi,

The attached screeshot shows an error I'm getting with one particular directory. I guess the error is caused by one or more files that have very long paths. When the CSV-report is written, the field size is apparently longer than the csv.field_size_limit of 131072, which causes the csv-library to throw "_csv.Error: field larger than field limit (131072)". One workaround would be to allow this limit to be larger in special cases (see: [https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072]). It would be convenient if a longer field length could be passed as an argument to Brunnhilde or maybe there's another workaround.

Kind regards,

Maarten

tw4l commented 6 years ago

Hi Maarten. Thank you for the bug report and the helpful link! I will investigate in the next few days and get back to you. I have one other outstanding bug related to unicode handling of file names and so it seems like a good time for a bugfix update.

tw4l commented 6 years ago

Hi @MSavels - my apologies for the delay, I'm now looking into this issue. The error appears to be when the Siegfried-written CSV file is read into the sqlite3 database. I think your field_size_limit should work, but I'm wondering what the cause of the long field size is. Would you be willing to share the offending CSV file so I can investigate?

richardlehane commented 6 years ago

It'd have to be a really long path surely :) My bet is that it will be the basis field. Sometimes sf goes a bit beserk and records a huge amount of data in this field. It's been reported as an issue here: https://github.com/richardlehane/siegfried/issues/111 I've developed a fix for this which will be in the 1.7.9 release that should land about the end of this month. Suspect that fix will resolve this issue too

tw4l commented 6 years ago

Thanks, @richardlehane! I'll keep this issue open for now. @MSavels - would you mind testing this again with Siegfried 1.7.9 when it's released and letting me know if you're still experiencing the issue?

MSavels commented 6 years ago

Hi, Sorry for not answering earlier. I was away on holidays. I can't share the offending CSV-file, as it is never written. After the error, Brunnhilde just exits. What I can do, however, is test again only with Siegfried and see if the same error pops up. Naturally, I'll test again with Siegfried 1.7.9. Regards, Maarten

MSavels commented 6 years ago

Hi, I redid the analysis of the offending dataset only with Siegfried 1.7.8. Now the CSV-file is written and opens without problems. I can send you the CSV-file, it's about 80 MB.

Regards, Maarten

tw4l commented 6 years ago

Hi Maarten. Since the Siegfried update solved this problem, I'm marking this issue as closed. Thanks for reporting!