phaag / nfdump

Netflow processing tools
Other
771 stars 202 forks source link

missing nfcapd instance for device #443

Closed thezoggy closed 1 year ago

thezoggy commented 1 year ago

Running a query on nfsen I see in the result it shows:

stat() error '/data/nfsen/profiles-data/live/<device_removed>/2023/04/26/nfcapd.202304261745': File not found!

Looking on the netflow box I do not see a nfcapd instance for that device running which explains why there is no data for nfsen/nfdump. In the nfsen.conf there is an entry for it (which is pragmatically built out) and the routers before/after do have nfcapd instances running.

I see that this broke a few weeks ago, as /data/nfsen/profiles-data/live/<router>/2023/04/17 was the last folder and 2015 was the last bucket. I see that this is when I updated nfdump to newer code (was running from github source and updated to latest at that time). Also applied some system updates and rebooted the box at that time.

My process on updating:

sudo systemctl stop nfsen

cd ~/nfdump/
git restore m4/
git pull
./autogen.sh
./configure --enable-nfprofile --enable-maxmind --enable-readpcap --enable-nfpcapd --enable-sflow
make
sudo make install
sudo ldconfig

sudo systemctl start nfsen

and my nfsen.service

[Unit]
Description=NfSen Service
After=network.target

[Service]
Type=forking
ExecStart=/data/nfsen/bin/nfsen start
ExecStop=/data/nfsen/bin/nfsen stop
Restart=on-abort
TimeoutSec=900

[Install]
WantedBy=multi-user.target

Not seeing anything in logs to say anything for the device but I do see some "Run nfdump failed: Exit: 1, Signal: 0, Coredump: 0" around that time but I don't recall the reason on why it failed and what I did to fix unless it was just restarting nfesen service.

Yesterday I updated nfdump to latest git (to get to 1.7.2) and that did not resolve the issue. What is the best way to triage why this one device has stopped working/jump start it?

thezoggy commented 1 year ago

ok figured it out, just manually trying to spawn the process:

Receive socket error: could not open the requested socket: Address already in use

I see its because rsyslog sadly claimed a port that would have been used and blocked it:

sudo lsof -i -P -n | grep 40127 rsyslogd 1991 syslog 7u IPv4 77183 0t0 UDP *:40127

killed the process, it re-spawned and used another port that wasnt conflicting. now just stopped/started the nfsen service and i see nfcapd instance going again for that device and all is good again.