richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
214 stars 30 forks source link

Filename/filepath output with extra single quotation mark? #236

Closed PhillipAasvangTommerholt closed 9 months ago

PhillipAasvangTommerholt commented 9 months ago

First of all - thank you for a great identification tool!

I found a little detail:

When I run Siegfried on a file called sdf'f.txt then the filename prompts as sdf''f.txt

image

richardlehane commented 9 months ago

Hi Phillip - filenames are hard!

The behaviour here is deliberate. The default siegfried output is YAML. I use single-quoted style for the strings (https://yaml.org/spec/1.2.2/#732-single-quoted-style) which is super permissive, nearly any character is allowed without escaping, even new lines. The only exception are single-quote characters themselves, which need to be quoted with another single-quote.

If you're using the YAML output as part of a workflow I'd recommend running it through a YAML decoding step before using the output. This should give you back the correct filename.

You could also try the CSV or JSON output (which do their own escaping of certain characters too - more escaping is necessary for these output modes - but single quotes might be fine?).

PhillipAasvangTommerholt commented 9 months ago

Thank you for your quick reply, and for you recommendation. I will look into a YAML decoding step, and YAML in general since I am new to YAML :-)