Closed gabrielmocan closed 4 months ago
I'll have a look. Thanks for the sample! This always helps!
It was indeed a boundary check error in the go decoding code. I fixed that in master. An updated new version will follow. Nfdump has a boundary check integrated already, but I also improved that in nfdump master. The boundary check skips bad records. For some reason, one single record in your file is corrupt.
@phaag I have yet another file that is not passing boundary check and panicking. Sample is attached.
Record 64357: decoding error: Record body boundary check error
Record 64406: decoding error: Record body boundary check error
panic: runtime error: slice bounds out of range [:1635282] with capacity 1635280
goroutine 6 [running]:
github.com/phaag/go-nfdump.(*NfFile).AllRecords.func1()
/Users/gabemocan-mw/go/pkg/mod/github.com/phaag/go-nfdump@v0.0.5/nffile.go:277 +0x14a0
created by github.com/phaag/go-nfdump.(*NfFile).AllRecords in goroutine 1
/Users/gabemocan-mw/go/pkg/mod/github.com/phaag/go-nfdump@v0.0.5/nffile.go:270 +0x80
exit status 2
That sample is really corrupt! However, I need to friendly exit or skip.
A datablock is missing records. Do you have multiple processes writing to the same file?
% nfdump -v broken2.sample Darwin 23.4.0
File : broken2.sample
Version : 2 - not compressed
Created : 2024-05-05 13:52:00
Created by : nfcapd
nfdump : f1070400
encryption : no
Appdx blks : 1
Data blks : 6
Checking data blocks
Block 5 num records 9255 != counted records: 9250
A datablock is missing records. Do you have multiple processes writing to the same file?
It's a single nfcapd -n ... -n ... -n ...
process with multiple directories, one for each exporter, so, no multiple processes writing to the same file.
But I suspect something is wrong with the VM hosting this collector. I'm having segfaults I can't explain on my processing code, although no errors on nfcapd process. Maybe physical memory fault or faulty storage, still not sure.
That sample is really corrupt! However, I need to friendly exit or skip.
For now that would do the trick, just to avoid the panic calls.
I added another data block boundary check! It spits an error, but does no longer crashes!
Have you checked the syslog file? any specific error messages of the collector?
I added another data block boundary check! It spits an error, but does no longer crashes!
Thanks! Will try right away.
Have you checked the syslog file? any specific error messages of the collector?
Apparently no errors on the collector side, I run it on a dedicated container. Logs are clean.
I added another data block boundary check! It spits an error, but does no longer crashes!
I guess the output can be less verbose, this line Next block...
it's not needed, the error log is the important.
go run . nfdumpNative ../tests/samples/broken2.sample
Next block - type: 3, records: 11892, size: 2097072
Next block - type: 3, records: 11883, size: 2097060
Next block - type: 3, records: 11874, size: 2097048
Next block - type: 3, records: 11881, size: 2097100
Next block - type: 3, records: 11857, size: 2097004
Next block - type: 3, records: 9255, size: 1635280
Record 64357: decoding error: Record body boundary check error
Record 64406: decoding error: Record body boundary check error
DataBlock error: count: 9255, size: 1635280. Found: 9250, size: 1635280
Sorry - fixed.
@phaag just to feedback to you, this VM had faulty memory. That's why files were so messed up! Still, we made the code more resilient. That's good anyways.
But I suspect something is wrong with the VM hosting this collector. I'm having segfaults I can't explain on my processing code, although no errors on nfcapd process. Maybe physical memory fault or faulty storage, still not sure.
Hi Pete,
I have a nffile that is properly decoded using classic nfdump
1.7.4-a16f86f
but throws an error when using go-nfdumpv0.0.4
.Below some logs and I'm attaching the sample.
broken.sample.zip