Closed ledmirage closed 2 years ago
Hi @ledmirage: you're getting unknown here because there's an extension match from the PRONOM registry. In this case sf is assuming it could be a malformed file of one those types, and exits with UNKNOWN and a warning. As you've found, you'll only get a x-fmt/111 (or text) match if the file has a ".txt" extension or if no extension matches in the PRONOM database. This covers many other types of text files you see in the wild such as "README" files.
You can use the roy tool to modify your signature file to give more control over how results are reported. Instructions for this are here: https://github.com/richardlehane/siegfried/wiki/Building-a-signature-file-with-ROY
There's a few approaches you could take, depending on the result you want...
1) Do you want "x-fmt/111" reported for the .dat files you are matching? Or are these some more specific file type that isn't in PRONOM at all? If that's the case, you could try to get the file type registered with PRONOM and, in the meantime, add a custom format to your signature file. Command for this is:
roy build -extend custom-fmt1.xml
(add custom signatures in DROID format e.g. using this utility. Custom signature should be placed in a "custom" directory within your siegfried home directory)
2) If you never expect any of the three .dat signatures in PRONOM to match any of the files in your repository, you could just exclude those signatures to get the result you want. Command for this is:
roy build -exclude @.dat
3) Or you could use the "-multi" flag to try a more exhaustive mode of matching. This flag alters the rules that sf applies when trying to determine which formats. I'm not sure if this will help, but you could try:
roy build -multi exhaustive
recently i noticed that if a text file (with extension .txt), sf can detect as text. if i change to other extension, like .aaa, it seems fine too (also detected as text)
however, if the text file is having extension .dat, it will be detected as unknown
what is the logic behind this? is there a way to make sure text file with .dat extension still detected as .txt ?
sample output for .dat:
sample file but different extension (.aaa) is okay:
sample attached, they are just simple text file with string "hello world"
sample.zip