richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
224 stars 30 forks source link

Crash with certain values of -multi flag [Win 7] #131

Closed harryjmoss closed 4 years ago

harryjmoss commented 5 years ago

I've noticed some strange (to me!) behaviour when running the command find $(cat InputFiles.txt) -type f | sf -f -nr -coe -multi 300 -hash sha256 -droid - > SFLog.csv

over a collection of ~120 GB of TIFF files. When passing values of <256 to -multi to test processing speed, Siegfried crashes followed by an error of the type:

Exception 0xc0000006 0x0 0x267e8000 0x6ae71e
PC=0x6ae71e

github.com/richardlehane/siegfried/internal/siegreader.(*Reader).ReadByte(0xc0056a3b40, 0xc00624df16, 0x0, 0x0)
        c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:79 +0xce
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).match(0xc000704140, 0x8c69e0, 0xc0056a3b40, 0xc00005acc0)
        c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0x1a1
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).Index
        c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

goroutine 1 [chan send]:

When switching back to -multi >= 256, I no longer see the error and the sha256 hash generation runs successfully over all files.

I can't think of an explanation for this behaviour and was wondering if you might have any ideas!

richardlehane commented 5 years ago

a small -multi value e.g. -multi 32 crashes, but a big -multi value e.g. -multi 300 succeeds? That's really weird. Is there more in the crash report: if you've got a -multi larger than 1 set I would have expected more goroutines to be running? (you could save it to a file by doing > SFLog.csv 2> crash.log)

richardlehane commented 5 years ago

also might be worth trying the command without the -coe as that flag might be disguising the real bug

richardlehane commented 5 years ago

p.s. have you seen the -setconf flag, it's useful if you are running sf with a lot of flags: https://github.com/richardlehane/siegfried/wiki/Identifying-file-formats#saving-scan-settings

harryjmoss commented 5 years ago

Hi Richard, this is running with -multi 3 and the -coe flags:

Exception 0xc0000006 0x0 0x172d0000 0x6ae71e
PC=0x6ae71e

github.com/richardlehane/siegfried/internal/siegreader.(*Reader).ReadByte(0xc004ab8040, 0xc004837f17, 0x0, 0x0)
        c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:79 +0xce
github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).match(0xc0006e0980, 0x8c69e0, 0xc004ab8040, 0xc004ac81e0)
        c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0x1a1
created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(*fwac).Index
        c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

goroutine 1 [syscall, locked to thread]:
syscall.Syscall(0x76f5aaa0, 0x3, 0xc00497d6c0, 0x0, 0xc0000916c0, 0x0, 0x0, 0x0)
        C:/go/src/runtime/syscall_windows.go:184 +0xea
syscall.GetFileAttributesEx(0xc00497d6c0, 0x0, 0xc0000916c0, 0x6d, 0x6d)
        C:/go/src/syscall/zsyscall_windows.go:912 +0x80
os.stat(0x83e28b, 0x4, 0xc0049f6a10, 0x6c, 0x2000000, 0x0, 0x0, 0x0, 0x0)
        C:/go/src/os/stat_windows.go:79 +0x210
os.statNolog(...)
        C:/go/src/os/stat_windows.go:126
os.Stat(0xc0049f6a10, 0x6c, 0x6c, 0x1000, 0xc0049f6a10, 0x6c)
        C:/go/src/os/stat.go:13 +0x76
main.main()
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:451 +0xd58

goroutine 6 [chan receive]:
main.printer(0xc000053b00, 0xc00043fa90)
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:157 +0x84
created by main.main
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:402 +0xb98

goroutine 503 [chan receive]:
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).identify(0xc0000c3a20, 0xc004955540, 0xc004ac8060, 0xc004ac80c0, 0xc0049556a0, 0x1, 0x1)
        c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/identify.go:125 +0xf18
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).Identify
        c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/bytematcher.go:173 +0xd3

goroutine 502 [chan receive]:
github.com/richardlehane/siegfried.(*Siegfried).IdentifyBuffer(0xc0000c3ad0, 0xc004955540, 0x0, 0x0, 0xc0049e6f50, 0x6c, 0x0, 0x0, 0x3206b0, 0x7cbf40, ...)
        c:/gopath/src/github.com/richardlehane/siegfried/siegfried.go:405 +0xc0c
main.identifyRdr(0x8c6fe0, 0xc000520130, 0xc0000a4000, 0xc000053b00, 0x85c3a0)
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:207 +0x141
main.readFile(0xc0000a4000, 0xc000053b00, 0x85c3a0)
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:185 +0xbb
main.identifyFile.func1(0xc0000a4000, 0xc000053b00, 0x85c3a0)
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:198 +0x61
created by main.identifyFile
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:196 +0xee

goroutine 488 [syscall, locked to thread]:
syscall.Syscall9(0x76f608b0, 0x7, 0xc00497d7a0, 0x80000000, 0x3, 0x0, 0x3, 0x80, 0x0, 0x0, ...)
        C:/go/src/runtime/syscall_windows.go:210 +0xf3
syscall.CreateFile(0xc00497d7a0, 0x380000000, 0x0, 0x8000000003, 0x0, 0x0, 0x0, 0x0)
        C:/go/src/syscall/zsyscall_windows.go:296 +0xd8
syscall.Open(0xc004974e00, 0x6c, 0x80000, 0x0, 0x4, 0x0, 0x0)
        C:/go/src/syscall/syscall_windows.go:290 +0x123
os.openFile(0xc004974e00, 0x6c, 0x0, 0x0, 0x0, 0x4, 0x4)
        C:/go/src/os/file_windows.go:97 +0xa5
os.openFileNolog(0xc004974e00, 0x6c, 0x0, 0xc000000000, 0xc004a491c0, 0xc004a49200, 0xc002e653c0)
        C:/go/src/os/file_windows.go:158 +0x141
os.OpenFile(0xc004974e00, 0x6c, 0x0, 0xc000000000, 0xc004a52350, 0x8cb400, 0xc004931560)
        C:/go/src/os/file.go:284 +0x66
os.Open(...)
        C:/go/src/os/file.go:265
main.readFile(0xc000422680, 0xc000053b00, 0x85c3a0)
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:177 +0x66
main.identifyFile.func1(0xc000422680, 0xc000053b00, 0x85c3a0)
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:198 +0x61
created by main.identifyFile
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:196 +0xee

goroutine 504 [chan receive]:
github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).scorer.func6(0xc004ac8120, 0xc004aaa6d8, 0xc004aaa6d0, 0xc004aaa6c8, 0xc004a741e0, 0xc0000c3a20, 0xc0049fa6c0, 0xc0049fa720, 0xc004955700, 0xc0049fa6f0, ...)
        c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:406 +0x57
created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).scorer
        c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:405 +0x333

goroutine 464 [running]:
        goroutine running on other thread; stack unavailable
created by main.identifyFile
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:196 +0xee

goroutine 525 [running]:
        goroutine running on other thread; stack unavailable
created by main.identifyFile
        c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:196 +0xee
rax     0xc004ab8040
rbx     0x1000
rcx     0x172d0000
rdi     0xc004ab8040
rsi     0x172d0000
rbp     0xc004837f00
rsp     0xc004837ed0
r8      0x464
r9      0x359ffdb
r10     0x1d3
r11     0x0
r12     0x2
r13     0x2
r14     0x20000
r15     0x1
rip     0x6ae71e
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b

running the same command without -coe just gives

Exception 0xc0000006 0x0 0x132d8000 0x6ae71e
PC=0x6ae71e

github.com/richardlehane/siegfried/internal/siegreader.(*Reader).ReadByte(0xc004994300, 0xc0048b3f16, 0x0, 0x0)
        c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:79 +113

or in some cases just Exception before the process ends.

Thanks for the tip about the setconf flag seems like that will be useful!

richardlehane commented 5 years ago

thanks for this extra detail Harry.

sf uses memory mapping for IO and it looks like the issue you are seeing is caused by that memory mapping failing (a page of memory becomes unavailable during use) which causes a windows exception.

If there's an underlying fault in sf it is probably that I am trying to re-use a memory mapped file after I've previously closed it - but I'm not sure in what circumstance that would happen & I don't have a windows pc to try to reproduce, so may need to pepper you with more Qs sorry!

Could you see if the failure happens in the following cases... 1) if you don't do hashing (no -h flag) 2) if you don't do sf -f streaming but run a normal directory scan with all same settings

Also: 3) does the error always happen in the same place, a particular file? (you could run with -log progress to see this) 4) what kind of drive are you scanning: is it over the network, or a local HDD - spinning/SSD??

richardlehane commented 4 years ago

Closing this for now, as have not been able to reproduce & no further info provided. Please re-open if this issue recurs