richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
223 stars 30 forks source link

make eof mirroring a bit more useful for multi-wild segment signatures like fmt/704 #91

Closed richardlehane closed 8 years ago

richardlehane commented 8 years ago

The mirror of fmt/704 is: image

It probably makes more sense just to reverse the last wild segment in this signature and not the first (which is more naturally found near the BOF).

Open question - when mirroring signatures that contain multiple wildcard segments, what is the desirable behaviour? Reverse all segments (as current), reverse only the last wildcard segment (as would suit fmt/704), produce a bunch more signatures covering every variant (possibly too verbose).

richardlehane commented 8 years ago

complicating things: for pdf/a - you really want all the wild card segments to be reversed image

richardlehane commented 8 years ago

complicating things further:

If change anchoring from PREV to SUCC, how to prevent false matches where those anchors are significant. E.g. in case of fmt/704 - how to ensure that that SUCC segment doesn't come before it's neighbouring PREV segment?

richardlehane commented 8 years ago

closing this without changes as, after reflection, it seems impossible to improve this without running the risk of false identifications because not respecting previous/successor relations between segments.