richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
217 stars 30 forks source link

Inconsistent output with roy inspect priorities #192

Closed Dclipsham closed 2 years ago

Dclipsham commented 2 years ago

Odd one this.

Am using roy inspect to pull some data out of historic DROID signature files, effectively for every XML signature file I'm using a couple of batch one-liners to firstly build a signature (.sig file) with roy build, then for every signature, running inspect priorities and piping output (ignore that I'm outputting as CSV - format is still dot but the next step in my workflow will be trimming it down to csv).

For output, the actual command for each line ends up looking like: roy inspect -home d:\siegfried\11\ p > d:\siegfried\output\priorities_11.csv - where the \11\ directory only contains the DROID binary XML file, container XML file, and compiled roy default.sig file. I'm not changing the .sig file between runs, just the output destination.

But I was noticing that for each run the file sizes for a given output varied, so one run the output for v11 was ~16k, the next ~17k but using the same .sig file (just different output destinations).

On closer inspection, some of the lines in the larger file are repeated. I've used an alphabetical line sort to show the issue. In the attached 'weird_output1_priorities_11.csv' was one run that appears to be fine, but in 'weird_output_2_priorities_11.csv', a second run against the same data. you'll see some lines are duplicated starting around line 130 (HTML -> SVG), then a few more a little further down (HTML -> DROID XML).

weird_output1_priorities_11.csv weird_output_2_priorities_11.csv

My guess is just a processing concurrency quirk, and it's not getting in my way, but it's happening more than once so I thought best to flag it.

richardlehane commented 2 years ago

thanks for reporting, this is weird. I'll see if I can work out what's happening

richardlehane commented 2 years ago

Hi @Dclipsham I think this is now fixed with sf 1.9.4 but pls reopen if you see it again