richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
224 stars 30 forks source link

Meaningful error message required rather than core dump #109

Closed tjolliffe closed 6 years ago

tjolliffe commented 6 years ago

Hi Richard, I tried to use siegfried on a zip file which was too large to process (46gb in size). Siegfried attempted to load the whole zip into memory and failed, displaying the message below. An out-of-memory error message would be better than a core dump in this instance.

Y:\XXXX\Converted videos, film\Consignment AV>sf -z -csv "Consignment AV.zip"

\XXXX\Working\XXXX_ConsignmentAV_sf.csv Exception 0xc0000006 0x0 0x440629dfc 0x4581f5 PC=0x4581f5

github.com/richardlehane/siegfried/internal/siegreader.(Reader).ReadAt(0xc04253 68c0, 0xc0426345e6, 0x7010, 0x7a1a, 0x3c0632e6c, 0xa36330, 0x3c655e8, 0xd0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea der.go:123 +0xd6 io.(SectionReader).Read(0xc045cfa2a0, 0xc0426345e6, 0x7010, 0x7a1a, 0xc045c9f26 0, 0xc04201b080, 0xc045c9f260) C:/go/src/io/io.go:465 +0x83 bufio.(Reader).Read(0xc045b8d4a0, 0xc0426345e6, 0x7010, 0x7a1a, 0x5e6, 0x0, 0x0 ) C:/go/src/bufio/bufio.go:199 +0x1aa io.ReadAtLeast(0x9c49a0, 0xc045b8d4a0, 0xc042634000, 0x75f6, 0x8000, 0x75f6, 0x7 d3b80, 0xc045c9f300, 0x9c49a0) C:/go/src/io/io.go:309 +0x8d io.ReadFull(0x9c49a0, 0xc045b8d4a0, 0xc042634000, 0x75f6, 0x8000, 0xc045c9f3b0, 0x411a6d, 0xc042046018) C:/go/src/io/io.go:327 +0x5f compress/flate.(decompressor).copyData(0xc04257d300) C:/go/src/compress/flate/inflate.go:663 +0xf5 compress/flate.(decompressor).Read(0xc04257d300, 0xc1c5cd6000, 0x1000, 0x800000 00, 0x0, 0x100000000, 0xc145cd6000) C:/go/src/compress/flate/inflate.go:347 +0x79 archive/zip.(pooledFlateReader).Read(0xc045cf66a0, 0xc1c5cd6000, 0x1000, 0x8000 0000, 0x0, 0x0, 0x0) C:/go/src/archive/zip/register.go:90 +0x139 archive/zip.(checksumReader).Read(0xc045bbee10, 0xc1c5cd6000, 0x1000, 0x8000000 0, 0x100000000, 0x100000000, 0x0) C:/go/src/archive/zip/reader.go:194 +0x7f github.com/richardlehane/siegfried/internal/siegreader.(stream).fill(0xc0422316 80, 0x80000000, 0x0, 0x0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str eam.go:76 +0xd3 github.com/richardlehane/siegfried/internal/siegreader.(stream).CanSeek(0xc0422 31680, 0x0, 0xc042221001, 0xc045d00400, 0x0, 0x0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str eam.go:155 +0x229 github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).identify(0xc0 4207d8c0, 0xc045cf66c0, 0xc045cf2c00, 0xc045cf2c60, 0xc044701b48, 0x0, 0x1) c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/id entify.go:96 +0x15d6 created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).Id entify c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/by tematcher.go:173 +0xd3

goroutine 1 [chan receive, 3 minutes]: github.com/richardlehane/siegfried.(*Siegfried).IdentifyBuffer(0xc04207d970, 0xc 045cf66c0, 0x0, 0x0, 0xc045bc0ee0, 0x70, 0x0, 0x0, 0xc04202d150, 0x47d227, ...) c:/gopath/src/github.com/richardlehane/siegfried/siegfried.go:385 +0x103 4 main.identifyRdr(0x3c30030, 0xc045bbee10, 0xc045be0000, 0xc042034fc0, 0x809918) c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:205 +0x127

main.identifyRdr(0x9c5960, 0xc0424386c8, 0xc04239e580, 0xc042034fc0, 0x809918) c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:266 +0x59f

main.readFile(0xc04239e580, 0xc042034fc0, 0x809918) c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:183 +0xa9 main.identifyFile(0xc04239e580, 0xc042034fc0, 0x809918) c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:191 +0x97 main.identify.func1(0xc0420480a0, 0x12, 0x9cc400, 0xc042035020, 0x0, 0x0, 0xc042 5edbf0, 0xc0420e61a0) c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows .go:113 +0x4fb path/filepath.walk(0xc0420480a0, 0x12, 0x9cc400, 0xc042035020, 0xc042231590, 0x0 , 0x50) C:/go/src/path/filepath/path.go:356 +0x88 path/filepath.Walk(0xc0420480a0, 0x12, 0xc042231590, 0x7, 0x0) C:/go/src/path/filepath/path.go:403 +0x124 main.identify(0xc042034fc0, 0xc0420480a0, 0x12, 0x0, 0x0, 0x0, 0x809918, 0xed194 6cf2, 0xa16ca0) c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows .go:116 +0xe3 main.main() c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:466 +0xad4

goroutine 4 [chan receive, 3 minutes]: main.printer(0xc042034fc0, 0xc042231540) c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:153 +0xba created by main.main c:/gopath/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:388 +0x8a1

goroutine 287 [chan receive, 3 minutes]: github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).scorer.func6( 0xc045cf2cc0, 0xc044701b60, 0xc044701b58, 0xc044701b50, 0xc045cfa300, 0xc04207d8 c0, 0xc045cfa330, 0xc045cfa390, 0xc045cf6780, 0xc045cfa360, ...) c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/sc orer.go:390 +0x57 created by github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).sc orer c:/gopath/src/github.com/richardlehane/siegfried/internal/bytematcher/sc orer.go:389 +0x3cf

goroutine 321 [semacquire, 3 minutes]: sync.runtime_SemacquireMutex(0xc0422316bc, 0xc045b33d00) C:/go/src/runtime/sema.go:71 +0x44 sync.(Mutex).Lock(0xc0422316b8) C:/go/src/sync/mutex.go:134 +0xf5 github.com/richardlehane/siegfried/internal/siegreader.(stream).Slice(0xc042231 680, 0x41000, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str eam.go:100 +0x74 github.com/richardlehane/siegfried/internal/siegreader.(Reader).setBuf(0xc045ce 5480, 0x41000, 0x0, 0x0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea der.go:50 +0x56 github.com/richardlehane/siegfried/internal/siegreader.(Reader).ReadByte(0xc045 ce5480, 0xc045b33f05, 0x0, 0x0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea der.go:70 +0x86 github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.( fwac).match(0xc04256bb20, 0x9c4e20, 0xc045ce5480, 0xc045cf2d80) c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa rdlehane/match/fwac/fwac.go:448 +0x2be created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/ma tch/fwac.(fwac).Index c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa rdlehane/match/fwac/fwac.go:439 +0x86

goroutine 299 [select, 3 minutes]: github.com/richardlehane/siegfried/internal/siegreader.(stream).EofSlice(0xc042 231680, 0x0, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/str eam.go:132 +0x12a github.com/richardlehane/siegfried/internal/siegreader.(ReverseReader).setBuf(0 xc045d12140, 0x0, 0x0, 0x0) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea der.go:169 +0x56 github.com/richardlehane/siegfried/internal/siegreader.(ReverseReader).ReadByte (0xc045d12140, 0xc04256b920, 0x7706c0, 0xc045c7ee60) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea der.go:212 +0x86 github.com/richardlehane/siegfried/internal/siegreader.(LimitReverseReader).Rea dByte(0xc042221040, 0xc045770a80, 0x65, 0x65) c:/gopath/src/github.com/richardlehane/siegfried/internal/siegreader/rea der.go:274 +0x68 github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.( fwac).match(0xc04256b940, 0x9c4de0, 0xc042221040, 0xc045d00480) c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa rdlehane/match/fwac/fwac.go:448 +0xd0 created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/ma tch/fwac.(fwac).Index c:/gopath/src/github.com/richardlehane/siegfried/vendor/github.com/richa rdlehane/match/fwac/fwac.go:439 +0x86 rax 0x440622e6c rbx 0x7010 rcx 0x440629e7c rdi 0xc0426345e6 rsi 0x440622e6c rbp 0xc045c9f1a0 rsp 0xc045c9f138 r8 0x1 r9 0x0 r10 0xc0426345e6 r11 0x20 r12 0x0 r13 0x0 r14 0x456320 r15 0x0 rip 0x4581f5 rflags 0x10283 cs 0x33 fs 0x53 gs 0x2b

Y:\XXXX\Converted videos, film\Consignment AV>

richardlehane commented 6 years ago

Hey Terry - I've reproduced on my local with a synthetic file (cool tool on windows for generating files of arbitrary size fill of random bytes: RDFC).

My computer heroically survived unzipping a 2GB file, a 5GB file, but finally choked on a 15GB file. Giving a runtime error:

runtime: out of memory: cannot allocate 17179869184-byte block (17242652672 in use) fatal error: out of memory

runtime stack: runtime.throw(0x7f1b7a, 0xd) C:/tools/go/src/runtime/panic.go:605 +0x9c runtime.largeAlloc(0x400000000, 0x1b0101, 0xc04e53e305) C:/tools/go/src/runtime/malloc.go:829 +0x11b runtime.mallocgc.func1() C:/tools/go/src/runtime/malloc.go:722 +0x4d runtime.systemstack(0xc042018600) C:/tools/go/src/runtime/asm_amd64.s:344 +0x7e runtime.mstart() C:/tools/go/src/runtime/proc.go:1125

goroutine 11 [running]: runtime.systemstack_switch() C:/tools/go/src/runtime/asm_amd64.s:298 fp=0xc0420293f8 sp=0xc0420293f0 pc=0x4549e0 runtime.mallocgc(0x400000000, 0x763dc0, 0x1, 0xc042539628) C:/tools/go/src/runtime/malloc.go:721 +0x7f7 fp=0xc0420294a0 sp=0xc0420293f8 pc=0x410df7 runtime.makeslice(0x763dc0, 0x400000000, 0x400000000, 0x1000, 0x1000, 0x0) C:/tools/go/src/runtime/slice.go:54 +0x7e fp=0xc0420294d0 sp=0xc0420294a0 pc=0x44006e github.com/richardlehane/siegfried/internal/siegreader.(stream).grow(...) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:56 github.com/richardlehane/siegfried/internal/siegreader.(stream).fill(0xc042539630, 0x200000000, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:70 +0x259 fp=0xc042029558 sp=0xc0420294d0 pc=0x6556e9 github.com/richardlehane/siegfried/internal/siegreader.(stream).CanSeek(0xc042539630, 0x0, 0xc04202f201, 0xc04586e000, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:155 +0x229 fp=0xc0420295b0 sp=0xc042029558 pc=0x656179 github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).identify(0xc0420698c0, 0xc04255f7c0, 0xc04203aa80, 0xc04203aae0, 0xc042607920, 0x0, 0x1) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/identify.go:96 +0x15d6 fp=0xc042029fa8 sp=0xc0420295b0 pc=0x68e086 runtime.goexit() C:/tools/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc042029fb0 sp=0xc042029fa8 pc=0x457681 created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).Identify c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/bytematcher.go:173 +0xd3

goroutine 1 [chan receive, 3 minutes]: github.com/richardlehane/siegfried.(*Siegfried).IdentifyBuffer(0xc042069970, 0xc04255f7c0, 0x0, 0x0, 0xc0425e34c0, 0x1f, 0x0, 0x0, 0xc04202d150, 0x47d227, ...) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/siegfried.go:385 +0x1034 main.identifyRdr(0x1080820, 0xc0425395e0, 0xc042380600, 0xc04254b680, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:205 +0x127 main.identifyRdr(0x9c7960, 0xc0420047b0, 0xc042380580, 0xc04254b680, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:266 +0x59f main.readFile(0xc042380580, 0xc04254b680, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:183 +0xa9 main.identifyFile(0xc042380580, 0xc04254b680, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:191 +0x97 main.identify.func1(0xc04200c100, 0xf, 0x9ce440, 0xc04254b6e0, 0x0, 0x0, 0xc0425e3380, 0xc0420ce1a0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:113 +0x4fb path/filepath.walk(0xc04200c100, 0xf, 0x9ce440, 0xc04254b6e0, 0xc042539590, 0x0, 0x50) C:/tools/go/src/path/filepath/path.go:356 +0x88 path/filepath.Walk(0xc04200c100, 0xf, 0xc042539590, 0x763c00, 0xc04202edf0) C:/tools/go/src/path/filepath/path.go:403 +0x124 main.identify(0xc04254b680, 0xc04200c100, 0xf, 0x0, 0x0, 0x0, 0x809fc0, 0xed171eefe, 0xa18d00) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:116 +0xe3 main.main() c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:466 +0xad4

goroutine 9 [chan receive, 3 minutes]: main.printer(0xc04254b680, 0xc0425394f0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:153 +0xba created by main.main c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:388 +0x8a1

goroutine 12 [chan receive, 3 minutes]: github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).scorer.func6(0xc04203ab40, 0xc042607938, 0xc042607930, 0xc042607928, 0xc0421e2900, 0xc0420698c0, 0xc0421e2930, 0xc0421e2990, 0xc04255f860, 0xc0421e2960, ...) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:390 +0x57 created by github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).scorer c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:389 +0x3cf

goroutine 14 [semacquire, 3 minutes]: sync.runtime_SemacquireMutex(0xc04253966c, 0x411900) C:/tools/go/src/runtime/sema.go:71 +0x44 sync.(Mutex).Lock(0xc042539668) C:/tools/go/src/sync/mutex.go:134 +0xf5 github.com/richardlehane/siegfried/internal/siegreader.(stream).Slice(0xc042539630, 0x59000, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:100 +0x74 github.com/richardlehane/siegfried/internal/siegreader.(Reader).setBuf(0xc042511780, 0x59000, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:50 +0x56 github.com/richardlehane/siegfried/internal/siegreader.(Reader).ReadByte(0xc042511780, 0xc04260bf00, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:70 +0x86 github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).match(0xc04255fb80, 0x9c6e20, 0xc042511780, 0xc04203a960) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0x2be created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).Index c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

goroutine 18 [select, 3 minutes]: github.com/richardlehane/siegfried/internal/siegreader.(stream).EofSlice(0xc042539630, 0x0, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:132 +0x12a github.com/richardlehane/siegfried/internal/siegreader.(ReverseReader).setBuf(0xc0421fd4c0, 0x0, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:169 +0x56 github.com/richardlehane/siegfried/internal/siegreader.(ReverseReader).ReadByte(0xc0421fd4c0, 0xc045ac21e0, 0x770c00, 0xc04255fc20) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:212 +0x86 github.com/richardlehane/siegfried/internal/siegreader.(LimitReverseReader).ReadByte(0xc04202f2d0, 0xc045758a80, 0x65, 0x65) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:274 +0x68 github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).match(0xc045ac2200, 0x9c6de0, 0xc04202f2d0, 0xc04586e060) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0xd0 created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).Index c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

It may be a little hard to guard against this error given that computers are all different in their RAM capacity, so can't just set an arbitrary limit of say 3GB of zipped content. But I'll see what can be done.

richardlehane commented 6 years ago

p.s. interesting that I got a golang runtime panic where you got a Windows OS exception panic... maybe because accessing the file over the network?

richardlehane commented 6 years ago

hmmm tried it on the same zip that broke for you TJ, but I didn't get your error, got one that looks a lot like my synthetic file error:

runtime: out of memory: cannot allocate 17179869184-byte block (17244651520 in use) fatal error: out of memory

runtime stack: runtime.throw(0x7f1b7a, 0xd) C:/tools/go/src/runtime/panic.go:605 +0x9c runtime.largeAlloc(0x400000000, 0x1b0101, 0x434006) C:/tools/go/src/runtime/malloc.go:829 +0x11b runtime.mallocgc.func1() C:/tools/go/src/runtime/malloc.go:722 +0x4d runtime.systemstack(0xa193d8) C:/tools/go/src/runtime/asm_amd64.s:344 +0x7e runtime.mstart() C:/tools/go/src/runtime/proc.go:1125

goroutine 277 [running]: runtime.systemstack_switch() C:/tools/go/src/runtime/asm_amd64.s:298 fp=0xc04202d3f8 sp=0xc04202d3f0 pc=0x4549e0 runtime.mallocgc(0x400000000, 0x763dc0, 0x1, 0xc042075e48) C:/tools/go/src/runtime/malloc.go:721 +0x7f7 fp=0xc04202d4a0 sp=0xc04202d3f8 pc=0x410df7 runtime.makeslice(0x763dc0, 0x400000000, 0x400000000, 0x1000, 0x1000, 0x0) C:/tools/go/src/runtime/slice.go:54 +0x7e fp=0xc04202d4d0 sp=0xc04202d4a0 pc=0x44006e github.com/richardlehane/siegfried/internal/siegreader.(stream).grow(...) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:56 github.com/richardlehane/siegfried/internal/siegreader.(stream).fill(0xc042447bd0, 0x200000000, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:70 +0x259 fp=0xc04202d558 sp=0xc04202d4d0 pc=0x6556e9 github.com/richardlehane/siegfried/internal/siegreader.(stream).CanSeek(0xc042447bd0, 0x0, 0xc045bf0b01, 0xc045dbc600, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:155 +0x229 fp=0xc04202d5b0 sp=0xc04202d558 pc=0x656179 github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).identify(0xc0420638c0, 0xc045da0280, 0xc045dca0c0, 0xc045dca120, 0xc045d62b78, 0x0, 0x1) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/identify.go:96 +0x15d6 fp=0xc04202dfa8 sp=0xc04202d5b0 pc=0x68e086 runtime.goexit() C:/tools/go/src/runtime/asm_amd64.s:2337 +0x1 fp=0xc04202dfb0 sp=0xc04202dfa8 pc=0x457681 created by github.com/richardlehane/siegfried/internal/bytematcher.(*Matcher).Identify c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/bytematcher.go:173 +0xd3

goroutine 1 [chan receive, 9 minutes]: github.com/richardlehane/siegfried.(*Siegfried).IdentifyBuffer(0xc042063970, 0xc045da0280, 0x0, 0x0, 0xc042015e30, 0x70, 0x0, 0x0, 0xc042029150, 0x47d227, ...) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/siegfried.go:385 +0x1034 main.identifyRdr(0x16c0130, 0xc042075e00, 0xc04237a600, 0xc04254f020, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:205 +0x127 main.identifyRdr(0x9c7960, 0xc04243a140, 0xc04237a580, 0xc04254f020, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:266 +0x59f main.readFile(0xc04237a580, 0xc04254f020, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:183 +0xa9 main.identifyFile(0xc04237a580, 0xc04254f020, 0x809fc0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:191 +0x97 main.identify.func1(0xc04200a300, 0x12, 0x9ce440, 0xc04254f080, 0x0, 0x0, 0xc042009c20, 0xc0420c81a0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:113 +0x4fb path/filepath.walk(0xc04200a300, 0x12, 0x9ce440, 0xc04254f080, 0xc042447ae0, 0x0, 0x50) C:/tools/go/src/path/filepath/path.go:356 +0x88 path/filepath.Walk(0xc04200a300, 0x12, 0xc042447ae0, 0x763c00, 0xc04246c480) C:/tools/go/src/path/filepath/path.go:403 +0x124 main.identify(0xc04254f020, 0xc04200a300, 0x12, 0x0, 0x0, 0x0, 0x809fc0, 0xed171eefe, 0xa18d00) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/longpath_windows.go:116 +0xe3 main.main() c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:466 +0xad4

goroutine 33 [chan receive, 9 minutes]: main.printer(0xc04254f020, 0xc042447a40) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:153 +0xba created by main.main c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/cmd/sf/sf.go:388 +0x8a1

goroutine 278 [chan receive, 9 minutes]: github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).scorer.func6(0xc045dca180, 0xc045d62bb0, 0xc045d62ba8, 0xc045d62ba0, 0xc0457bfec0, 0xc0420638c0, 0xc0457bfef0, 0xc0457bff50, 0xc045da0340, 0xc0457bff20, ...) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:390 +0x57 created by github.com/richardlehane/siegfried/internal/bytematcher.(Matcher).scorer c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/bytematcher/scorer.go:389 +0x3cf

goroutine 280 [semacquire, 9 minutes]: sync.runtime_SemacquireMutex(0xc042447c0c, 0x80a700) C:/tools/go/src/runtime/sema.go:71 +0x44 sync.(Mutex).Lock(0xc042447c08) C:/tools/go/src/sync/mutex.go:134 +0xf5 github.com/richardlehane/siegfried/internal/siegreader.(stream).Slice(0xc042447bd0, 0x42000, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:100 +0x74 github.com/richardlehane/siegfried/internal/siegreader.(Reader).setBuf(0xc045dbe280, 0x42000, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:50 +0x56 github.com/richardlehane/siegfried/internal/siegreader.(Reader).ReadByte(0xc045dbe280, 0xc045d1ff38, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:70 +0x86 github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).match(0xc0457ae040, 0x9c6e20, 0xc045dbe280, 0xc045dca240) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0x2be created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).Index c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

goroutine 327 [select, 9 minutes]: github.com/richardlehane/siegfried/internal/siegreader.(stream).EofSlice(0xc042447bd0, 0x0, 0x1000, 0x0, 0x0, 0x0, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/stream.go:132 +0x12a github.com/richardlehane/siegfried/internal/siegreader.(ReverseReader).setBuf(0xc045d8f840, 0x0, 0x0, 0x0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:169 +0x56 github.com/richardlehane/siegfried/internal/siegreader.(ReverseReader).ReadByte(0xc045d8f840, 0xc0457c0200, 0x770c00, 0xc045cb5d40) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:212 +0x86 github.com/richardlehane/siegfried/internal/siegreader.(LimitReverseReader).ReadByte(0xc045bf0bd0, 0xc045e8c000, 0x65, 0x65) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/internal/siegreader/reader.go:274 +0x68 github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).match(0xc0457c0220, 0x9c6de0, 0xc045bf0bd0, 0xc045dbc6c0) c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:448 +0xd0 created by github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac.(fwac).Index c:/Users/richardl/Dropbox/programming/go/src/github.com/richardlehane/siegfried/vendor/github.com/richardlehane/match/fwac/fwac.go:439 +0x86

richardlehane commented 6 years ago

TJ: did some research into this. Detecting and preventing out of memory errors is evidently a hard problem! But the next release of golang (1.10) has something promising: they are working on "* APIs for memory and CPU resource control". This will hopefully allow me to detect available memory before attempting to allocate a big slice.

So likely any fix to this won't land before golang 1.10 which is due early 2018.

In the meantime, if you are using the "-z" flag: be aware that if your compressed file contains really big files, you can hit these out of memory errors. Temporary solution is to unzip before scanning with siegfried.

tjolliffe commented 6 years ago

Thanks Richard. My default approach will be to unzip pre-SF scan from now on anyway.

richardlehane commented 6 years ago

A possible alternate approach is to back-up stream contents to a temp file on disk. That way I won't need to reserve such a large chunk of memory. It is a little less tidy and may mean a significant slowdown in some scenarios but it will at least avoid things blowing up like this.

richardlehane commented 6 years ago

I fixed it with the temp file approach... see no panic... image

... but it took 41 mins :(

Behaviour now is: if sf is reading from a stream (which it does for contents of compressed files and when something is piped to sf -), then will use up to ARBITRARY_LIMIT of RAM to copy stream for scanning. Once ARBITRARY_LIMIT is hit, remainder of stream is copied to a temp file on disk. Doing the latter of course is a lot slower because really heavy IO. But it puts a cap on memory use and avoids out of memory panics.

Picking the right ARBITRARY_LIMIT is a challenge: it really depends on how much RAM different users have to spend. Also consider that you can have streams within streams within streams (e.g. a zip file that contains another zip file that itself contains a zip file) so might need multiples of the ARBITRARY_LIMIT. With the promised Golang 1.10 features for assessing available memory - may be able to make this smarter in future.

Currently ARBITRARY_LIMIT is set to ~65MB. I'm open to suggestions for changes to this setting. It could also be made configurable with a flag (e.g. -zlimit) if anyone would use that. E.g. if you have a lot of warc.gz files that are 1GB in size (a common size I think for web harvesting), you'd probably want a 1GB ARBITRARY_LIMIT so you could unload these into RAM.

tjolliffe commented 6 years ago

I think an adjustable limit would be a good idea due to the wide variety in specs for user machines. Perhaps a short description in help page to assist users guesstimate their optimal ARBITRARY_LIMIT. Regarding the default limit size, it would be interesting to see how much faster it would be to process the same "consignment AV.zip'' test file if the ARBITRARY_LIMIT is set to 10 times the size (~650MB).

richardlehane commented 6 years ago

image

Did a couple of tests and the difference in full RAM or temp disk use is actually pretty marginal for me. For a 3GB file, zipped, ~1m using full RAM or ~1m 3secs using temp disk after 65MB. More than anything this shows how great SSDs are!