richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
224 stars 30 forks source link

Panic on a .doc file #126

Closed JSLair closed 5 years ago

JSLair commented 5 years ago

When analysing a specific valid .doc file Siegfried is exiting in panic

panic: runtime error: index out of range

goroutine 19 [running]:
github.com/richardlehane/siegfried/internal/containermatcher.(*ContainerMatcher).processHits(0xc000392850, 0xc000569320, 0x1, 0x1, 0xc00055f6e0, 0xc0000427e0, 0xc00060c780, 0x13, 0xc00004a4e0, 0x1)
        c:/gopath/src/github.com/richardlehane/siegfried/internal/containermatcher/identify.go:228 +0x748
github.com/richardlehane/siegfried/internal/containermatcher.(*ContainerMatcher).identify(0xc000392850, 0xc0000180c0, 0xe, 0x862fe0, 0xc0005692f0, 0xc00004a4e0, 0xc00058c960, 0x1, 0x1)
        c:/gopath/src/github.com/richardlehane/siegfried/internal/containermatcher/identify.go:145 +0x26e
created by github.com/richardlehane/siegfried/internal/containermatcher.Matcher.Identify
        c:/gopath/src/github.com/richardlehane/siegfried/internal/containermatcher/identify.go:43 +0x264

I tried a correction in internal/containermatch/identify.go/ProcessHits removing indexes out of range range, and it works well. But as I don't know side effects, and I think thatif a value out of index range is returned there's a problem before, I would be really gratefull if you look at this problem.

One more information the file is truly a docx, even if its extension is .doc, and if I put a name with docx, Siegfried is not in panic...

---
siegfried   : 1.7.11
scandate    : 2019-04-16T10:43:50+02:00
signature   : default.sig
created     : 2019-02-16T11:09:29+01:00
identifiers :
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V94.xml; container-signature-20180917.xml'
---
filename : '.\12-01_~1.docx'
filesize : 60719
modified : 2019-04-16T10:29:06+02:00
errors   :
matches  :
  - ns      : 'pronom'
    id      : 'fmt/412'
    format  : 'Microsoft Word for Windows'
    version : '2007 onwards'
    mime    : 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
    basis   : 'extension match docx; container name [Content_Types].xml with byte match at 606, 94 (signature 1/3)'
    warning :

As the file is a real archive, even if it's not sensible, I would prefer to send it to you by email.

richardlehane commented 5 years ago

Thanks for the report Jean-Séverin. I'll look at it tonight. Please email the file if possible Richard

On Tue., 16 Apr. 2019, 10:45 Jean-Séverin Lair, notifications@github.com wrote:

When analysing a specific valid .doc file Siegfried is exiting in panic

panic: runtime error: index out of range

goroutine 19 [running]:github.com/richardlehane/siegfried/internal/containermatcher.(ContainerMatcher).processHits(0xc000392850, 0xc000569320, 0x1, 0x1, 0xc00055f6e0, 0xc0000427e0, 0xc00060c780, 0x13, 0xc00004a4e0, 0x1) c:/gopath/src/github.com/richardlehane/siegfried/internal/containermatcher/identify.go:228 +0x748github.com/richardlehane/siegfried/internal/containermatcher.(ContainerMatcher).identify(0xc000392850, 0xc0000180c0, 0xe, 0x862fe0, 0xc0005692f0, 0xc00004a4e0, 0xc00058c960, 0x1, 0x1) c:/gopath/src/github.com/richardlehane/siegfried/internal/containermatcher/identify.go:145 +0x26e created by github.com/richardlehane/siegfried/internal/containermatcher.Matcher.Identify c:/gopath/src/github.com/richardlehane/siegfried/internal/containermatcher/identify.go:43 +0x264

I tried a correction in internal/containermatch/identify.go/ProcessHits removing indexes out of range range, and it works well. But as I don't know side effects, and I think thatif a value out of index range is returned there's a problem before, I would be really gratefull if you look at this problem.

One more information the file is truly a docx, even if its extension is .doc, and if I put a name with docx, Siegfried is not in panic...


siegfried : 1.7.11 scandate : 2019-04-16T10:43:50+02:00 signature : default.sig created : 2019-02-16T11:09:29+01:00 identifiers :

  • name : 'pronom' details : 'DROID_SignatureFile_V94.xml; container-signature-20180917.xml'

    filename : '.\12-01_~1.docx' filesize : 60719 modified : 2019-04-16T10:29:06+02:00 errors : matches :

  • ns : 'pronom' id : 'fmt/412' format : 'Microsoft Word for Windows' version : '2007 onwards' mime : 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' basis : 'extension match docx; container name [Content_Types].xml with byte match at 606, 94 (signature 1/3)' warning :

As the file is a real archive, even if it's not sensible, I would prefer to send it to you by email.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/richardlehane/siegfried/issues/126, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJm7nuAidl9kztBIcpqIyY26q5EurkBks5vhY2hgaJpZM4cxr-4 .

richardlehane commented 5 years ago

Hi Jean-Séverin - this is fixed now on the develop branch. I've a few more tests to run and hope to release to production by the end of this week. Sorry for the delay on fixing this Richard

JSLair commented 5 years ago

The bug is fixed and the new version is now used in our archiving system. Thanx a lot!