richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
217 stars 30 forks source link

Panic on Roy Sig Creation #149

Closed gleporeNARA closed 3 years ago

gleporeNARA commented 3 years ago

Pretty sure this is on me for trying to create an incorrect signature, but I can't figure it out. I think I'm using the EOF PRONOM attribute wrong.

I get the following panic on the attached XML signature file:

panic: runtime error: index out of range [-1]

goroutine 1 [running]: github.com/richardlehane/siegfried/pkg/pronom.appendFragments(0xc0002562e0, 0x5, 0xbca900, 0x0, 0x0, 0xc000224620, 0x2, 0x2, 0x100, 0x0, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parse.go:309 +0x1d9a github.com/richardlehane/siegfried/pkg/pronom.processSubSequence(0xc0002562e0, 0x5, 0x1, 0xb4cdf0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parse.go:170 +0x683 github.com/richardlehane/siegfried/pkg/pronom.processDROID(0xc0002562e0, 0x5, 0xc000020140, 0x2, 0x2, 0x7c, 0x2, 0x8e8620, 0xc000b79fe0, 0x1) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parse.go:148 +0x285 github.com/richardlehane/siegfried/pkg/pronom.(droid).Signatures(0xc000a00ba0, 0xc000a48000, 0x618, 0xe1c, 0xc000a1a000, 0x618, 0xe1c, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parseable.go:314 +0x340 github.com/richardlehane/siegfried/internal/identifier.joint.Signatures(0x8e9600, 0xc000300100, 0x8e9500, 0xc000a00ba0, 0xc000273818, 0x30, 0x82ae80, 0x0, 0x0, 0x30, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/parseable.go:258 +0xbb github.com/richardlehane/siegfried/internal/identifier.filtered.Signatures(0xc000d00000, 0x671, 0x70f, 0x8e9880, 0xc0008dd580, 0x1, 0xc0002562e0, 0x5, 0xc001519aa0, 0x17, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/parseable.go:398 +0x49 github.com/richardlehane/siegfried/pkg/pronom.doublesFilter.Signatures(0xc000d00000, 0x671, 0x70f, 0x8e9880, 0xc0008dd580, 0xc000256270, 0xa, 0x1, 0xc0002562e0, 0x5, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parseable.go:54 +0xb4 github.com/richardlehane/siegfried/internal/identifier.sorted.Signatures(0x8e9580, 0xc00000e580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/parseable.go:547 +0x49 github.com/richardlehane/siegfried/internal/identifier.(Base).Add(0xc000f32780, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/base.go:379 +0x9aa github.com/richardlehane/siegfried.(*Siegfried).Add(0xc0001282c0, 0x8e8e80, 0xc00025f2f0, 0x8e8e80, 0xc00025f2f0) /home/travis/gopath/src/github.com/richardlehane/siegfried/siegfried.go:136 +0x328 main.makegob(0xc0001282c0, 0xc0000100a0, 0x1, 0x1, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/roy/roy.go:236 +0xa3 main.main() /home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/roy/roy.go:528 +0x65a

Packed-Font-File-Format-1.0-signature-file.txt

richardlehane commented 3 years ago

Hi Greg, I think I'll be able to fix the panic but I'm not sure what the intended behaviour for the signature is:

  1. The first EOF sub sequence has fragments but no sequence. I can see why but not sure if DROID supports this? I think I can probably get it to work but if it isn't legal for DROID would it be better to report an error?

  2. The second EOF sub sequence has offsets but nothing to match - what's the point of this bit of the pattern?

cheers Richard

gleporeNARA commented 3 years ago

Yeah, I think that's my fault, I don't understand the PRONOM syntax for EOF. Not sure what to put there if the string I'm looking for is 10 bytes from the end of the file, it seems to me that should be the Offset (from the end working backwards into the file.)

For that file it ends with an F5, then an optional couple of F6 values, then a few bytes of randomness, then the end of the file.

Maybe something Variable within the last 100 bytes, but I don't know how to express that in a PRONOM signature. I'm using Ross' new version of the Development Utility.

richardlehane commented 3 years ago

If the F6 values are optional, maybe just leave them out and match on the F5 byte? Is it always 10 bytes from the end of the file, or is it variable and might appear in the last 100 bytes or so?

If variable, you could do something like: image

perhaps?

gleporeNARA commented 3 years ago

The sample files end as follows:

F5F6EOF

F5F6F6EOF

F5F6F6F6EOF

F5EOF

I guess I could just do additional sequences for each ending, I was trying to roll them up into one signature.

richardlehane commented 3 years ago

Perhaps it is possible with something like: image

Which would give you something like (similar to your one - but without the empty second subsequence): image

I think I could probably get it to work with sf but not sure if this would work also in DROID?

gleporeNARA commented 3 years ago

That give me the same panic, I think:

`panic: runtime error: index out of range [-1]

goroutine 1 [running]: github.com/richardlehane/siegfried/pkg/pronom.appendFragments(0xc00031eec0, 0x5, 0xbca900, 0x0, 0x0, 0xc00080f0a0, 0x4, 0x4, 0x100, 0x0, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parse.go:309 +0x1d9a github.com/richardlehane/siegfried/pkg/pronom.processSubSequence(0xc00031eec0, 0x5, 0x1, 0xb4cdf0, 0x1, 0xb4cdf0, 0x1, 0x0, 0x0, 0x0, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parse.go:170 +0x683 github.com/richardlehane/siegfried/pkg/pronom.processDROID(0xc00031eec0, 0x5, 0xc000312390, 0x1, 0x1, 0x7c, 0x2, 0x8e8620, 0xc000e7c260, 0x1) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parse.go:148 +0x285 github.com/richardlehane/siegfried/pkg/pronom.(droid).Signatures(0xc00019eb50, 0xc000938000, 0x618, 0xe1c, 0xc00094e000, 0x618, 0xe1c, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parseable.go:314 +0x340 github.com/richardlehane/siegfried/internal/identifier.joint.Signatures(0x8e9600, 0xc0001d8000, 0x8e9500, 0xc00019eb50, 0xc000893818, 0x30, 0x82ae80, 0x0, 0x0, 0x30, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/parseable.go:258 +0xbb github.com/richardlehane/siegfried/internal/identifier.filtered.Signatures(0xc000b24000, 0x671, 0x70f, 0x8e9880, 0xc0000f7f40, 0x1, 0xc00031eec0, 0x5, 0xc000d8c860, 0x17, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/parseable.go:398 +0x49 github.com/richardlehane/siegfried/pkg/pronom.doublesFilter.Signatures(0xc000b24000, 0x671, 0x70f, 0x8e9880, 0xc0000f7f40, 0xc00031ee50, 0xa, 0x1, 0xc00031eec0, 0x5, ...) /home/travis/gopath/src/github.com/richardlehane/siegfried/pkg/pronom/parseable.go:54 +0xb4 github.com/richardlehane/siegfried/internal/identifier.sorted.Signatures(0x8e9580, 0xc00000e580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/parseable.go:547 +0x49 github.com/richardlehane/siegfried/internal/identifier.(Base).Add(0xc0003ac880, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/internal/identifier/base.go:379 +0x9aa github.com/richardlehane/siegfried.(*Siegfried).Add(0xc0001282c0, 0x8e8e80, 0xc0002451e0, 0x8e8e80, 0xc0002451e0) /home/travis/gopath/src/github.com/richardlehane/siegfried/siegfried.go:136 +0x328 main.makegob(0xc0001282c0, 0xc0000100a0, 0x1, 0x1, 0x0, 0x0) /home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/roy/roy.go:236 +0xa3 main.main() /home/travis/gopath/src/github.com/richardlehane/siegfried/cmd/roy/roy.go:528 +0x65a `

richardlehane commented 3 years ago

Hi Greg I've got rid of the panic by returning an error if an empty sequence element is given. I'm not sure if DROID does accept empty sequence elements (with just fragments) but if it does then I'll do further work to allow it. But I tested and you can get the result you want by putting the pattern directly in the sequence element and getting rid of the right fragments (you'll need to do this by hand editing the XML, the sig dev utility won't do this for you). I believe that this will also work in recent versions of DROID (i.e. patterns allowed in sequence elements). Your xml should look like this: image

all the best Richard

gleporeNARA commented 3 years ago

That's working, thanks! I'll talk to Ross about possibly adding in some validation to ensure that the resulting file will work, but I can't really explain the issue on this. It seems to me I was following PRONOM rules when I created the first file.

ross-spencer commented 3 years ago

This comment is interesting Richard: https://github.com/richardlehane/siegfried/issues/149#issuecomment-703796666 the main branch of the new utility was built around the potential to do this (if I read it correctly), but then I discovered its implementation was limited. I've recorded some of my notes here: https://github.com/exponential-decay/signature-development-utility/issues/3. As such my workaround is that I leverage the API of the original utility to route all original signature requests to the original PHP code. The simplified version would be lovely!