Closed MSavels closed 6 years ago
Hi Maarten. Thank you for the bug report and the helpful link! I will investigate in the next few days and get back to you. I have one other outstanding bug related to unicode handling of file names and so it seems like a good time for a bugfix update.
Hi @MSavels - my apologies for the delay, I'm now looking into this issue. The error appears to be when the Siegfried-written CSV file is read into the sqlite3 database. I think your field_size_limit should work, but I'm wondering what the cause of the long field size is. Would you be willing to share the offending CSV file so I can investigate?
It'd have to be a really long path surely :) My bet is that it will be the basis field. Sometimes sf goes a bit beserk and records a huge amount of data in this field. It's been reported as an issue here: https://github.com/richardlehane/siegfried/issues/111 I've developed a fix for this which will be in the 1.7.9 release that should land about the end of this month. Suspect that fix will resolve this issue too
Thanks, @richardlehane! I'll keep this issue open for now. @MSavels - would you mind testing this again with Siegfried 1.7.9 when it's released and letting me know if you're still experiencing the issue?
Hi, Sorry for not answering earlier. I was away on holidays. I can't share the offending CSV-file, as it is never written. After the error, Brunnhilde just exits. What I can do, however, is test again only with Siegfried and see if the same error pops up. Naturally, I'll test again with Siegfried 1.7.9. Regards, Maarten
Hi, I redid the analysis of the offending dataset only with Siegfried 1.7.8. Now the CSV-file is written and opens without problems. I can send you the CSV-file, it's about 80 MB.
Regards, Maarten
Hi Maarten. Since the Siegfried update solved this problem, I'm marking this issue as closed. Thanks for reporting!
Hi,
The attached screeshot shows an error I'm getting with one particular directory. I guess the error is caused by one or more files that have very long paths. When the CSV-report is written, the field size is apparently longer than the csv.field_size_limit of 131072, which causes the csv-library to throw "_csv.Error: field larger than field limit (131072)". One workaround would be to allow this limit to be larger in special cases (see: [https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072]). It would be convenient if a longer field length could be passed as an argument to Brunnhilde or maybe there's another workaround.
Kind regards,
Maarten