phaag / nfdump

Netflow processing tools
Other
765 stars 201 forks source link

Repeated "SequencerRun() ERROR" message in log #488

Closed AdUser closed 8 months ago

AdUser commented 9 months ago

After upgrade to debian 12 (nfdump version 1.7.1) see multiple error messages in log:

2023-12-07_04:33:51.17608 SequencerRun() ERROR - Attempt to read beyond input stream size
2023-12-07_04:33:51.17609 Process v9: Sequencer run error. Skip record processing
2023-12-07_04:33:51.17611 SequencerRun() ERROR - Attempt to read beyond input stream size
2023-12-07_04:33:51.17612 Process v9: Sequencer run error. Skip record processing

nfcapd runs with this args: /usr/bin/nfcapd -u nobody -e -y -b <collector ip> -p 9995 -B 200000 -m /run/nfexporter/sock -i 10 -M /mnt/netflow/data/netflow -S 1

Incoming netflow traffic speed up to 700kb/1.2Mbps (med/peak), about 30 sources. Tried to increase input buffer size (-B option) up to 2MB and disable metrics export -- no effect.

What exactly this error means and how to fix it?

phaag commented 9 months ago

This error means, that the expected input length of the data does not match, what the template announces. The sequencer is the programmable data processor which moves input data to the output stream. The traffic speed should not be the reason for that. How often do you see this message? Rarely or did you get flooded by those? Do the flow records looks correct, which are stored otherwise? Have you tried to evaluate, if it’s a specific source - just one or several of those 30 sources?

AdUser commented 9 months ago

How often do you see this message? Rarely or did you get flooded by those?

Often, several messages per second

Do the flow records looks correct, which are stored otherwise?

Can't check all archive, and error appeared just recently, but if i find something, i'll let know.

Have you tried to evaluate, if it’s a specific source - just one or several of those 30 sources?

It's good question. After limiting by firewall netflow packets only from "known good" sources (same location) error is gone (just half a day passed, but...). Now i'm inspecting remaining (remote) sources.

AdUser commented 8 months ago

During the check, I discovered a router with an incorrectly configured NAT along the route of netflow packets. After correcting the NAT configuration, the nfcapd errors also gone (it looks like the packets were somehow damaged or modified). So, i think this issue are resolved.

It will be helpful, if log message will contain ip address of associated source, diagnostics and debug will be faster and easier.

phaag commented 8 months ago

You are certainly right. I will improve the diagnostics. Many thanks for the input.