richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
214 stars 30 forks source link

CSV export error when encountering empty files #235

Closed AFalut closed 1 year ago

AFalut commented 1 year ago

When using Siegfried in command line, empty files are normally flagged with warnings in the command line interface report; however, when attempting to export the same report as a CSV, identification stops at the empty file and the following error message appears after the file's path and name: "[ERROR] empty source". A CSV is exported, but it is incomplete (unlike the CLI equivalent). bug siegfried

richardlehane commented 1 year ago

Hi @AFalut thanks for reporting this. I may need a bit more information to work out what is going wrong for you here.

The screenshot you've shared looks like things are working normally. When you redirect STDOUT to a results file (the CSV to the test.csv location in your screenshot), you'll only see the STDERR stream in the terminal. Siegfried by default logs various error messages to STDERR while it is running, including empty files, but these won't stop it running unless it hits a fatal error which will be reported with [FATAL]. From the screenshot you've shared, it looks like siegfried was still running at that point (when it stops you'd get a fresh command prompt). That may be why your test.csv file was incomplete?

If siegfried is crashing and not reporting a [FATAL] error it is likely some other file (not the empty one) is causing the problem. If that's the case, it may be helpful to use the "-log progress" option which can help narrow down where the problem is.

Could you try running sf -csv -log p . > test.csv (-log p is short for -log progress. For a list of all the logging options see this page).

AFalut commented 1 year ago

Hi @richardlehane , We were misled by an unusual (probably file location-related) processing time, but siegfried was in fact still running, and we eventually obtained a complete results file. Thank you for the very fast answer.