Open igorribeiroduarte opened 5 years ago
I forgot to say that I'm using n2disk1g
@igorribeiroduarte I've not been able to reproduce this yet, it seems it happens under certain condition, I will keep it under testing. However it seems that killing n2disk sometimes leaves the napatech stream in some inconsistent state leading to loops in the Napatech service, this is my assumption according to your symptoms. Setting a valid license to n2disk should avoid this situation.
@cardigliano I've been making some tests and the loop seems to be on n2disk service and not on napatech, because we have other applications reading from nt buffer and they keep working correctly after n2disk gets killed (after filling nt buffer). I already added n2disk and pfring licenses, but this still a problem for us, since sometimes we need to restart our stack. Alongside with that, I think the problem may be with n2disk initialization and not with the way n2disk is being killed, since n2disk seems to stop gracefully (I tested with SIGINT and SIGTERM) right before the problem happens.
@igorribeiroduarte could you provide the n2disk configuration file? Are you using port or stream in as interface in the configuration?
@cardigliano I didn't know that was a configuration file for n2disk. Where can I read about it? I wasn't able to find on documentation.
I'm using the following arguments to run:
n2disk1g -I -A index_dir -p 1024 -b 1024 -i nt:1 -n 1000 -m 1000 -t 15 -O /tmp -o /disco03 -o /disco04
As you can see, I'm reading directly from the port
@igorribeiroduarte please check this guide for the configuration file http://www.ntop.org/guides/n2disk/how_to_start.html
@cardigliano thanks, but it's just a way to preset the arguments, right? It doesn't affect the bug we're discussing, correct?
Correct
Any news on this bug? It's been happening quite often with us
@lucasbaile what's happening in your case exactly? Do you have issues after restarting n2disk?
@cardigliano The situation is exactly as described by @igorribeiroduarte. When running n2disk1g
, with or without a valid license, the n2disk1g binary seems to get stuck when it tries to gracefully teardown, holding the Napatech buffer, thus causing all the data on that stream to be dropped. The only action that seems to surely fix the situation is sending a SIGKILL to the process, but this is quite annoying when trying to automate some processes. I'll leave some more info to try and help. Any other info needed, just let me know.
Napatech Model: NT20E3-2-PTP
n2disk1g Version: v.3.4.200731 (r5214)
n2disk1g Command: n2disk1g -I -P /var/run/n2disk/n2disk_4.pid -G 1 -A index_dir -p 1024 -b 1024 -i nt:stream4 --disk-limit 20% -t 15 -o /data/task_4
I'm currently running n2disk without a license, just for testing, so the service goes down at every 5 minutes. I have napatech running inside a container and n2disk running inside another container, both services are being orchestrated by docker swarm. At the beginning n2disk works very well, capturing all my network traffic without dropping any packet, after 5 minutes the n2disk service goes down (as expected) and swarm brings the service back, this process is repeated indefinitely. After some time (it may takes minutes or hours), without any special event or throughput peak, napatech buffer reaches 100% and n2disk stops not only recording packets but also stops restarting at every 5 minutes (The service keeps up until I manually kill the process). Restart n2disk service doesn't solve the problem, as soon as n2disk is up, napatech buffer reaches 100% again and the problem remains. The services only gets back to the expected behavior after killing napatech AND n2disk service.
Below the output of /proc/net/pf_ring/stats/16004-none.383 file: