Open g1l3sp opened 2 years ago
Please let me know if I can clarify any of the information above, or gather more information that will be useful. This is a continuing issue for us that we're remediating with "bandage" scripts.
I updated the Fiberblaze software version to the latest (3.6.1.1) in the hopes that would help. I also confirmed we are on the latest firmware for the NIC. Restarting the host and starting the processes up again did temporarily correct the situation, but after about 10 days running, we started seeing it again: hundreds or sometimes thousands of tiny files dumped out by nprobe in the 59th second of the minute.
I should add that nprobe and related packages were also updated to the latest stable versions as a part of this. As of now we are on nprobe v.9.6.211220.
It's more than two years later, and this issue still occurs regularly (maybe every couple of weeks). My mitigation is to kill nprobe, reload the fiberblaze driver, start nprobe back up, and run my script that concatenates those files together into larger files based on the second that they occurred. If I don't reload the fiberblaze driver and just restart nprobe, the issue persists, which seems to indicate some sort of state in the driver that doesn't play nicely with nprobe. However, nprobe is the one creating the many small files, the majority of which have between 1 and 5 flows in them.
If there is any will to get to the bottom this, I'm glad to do what I can to assist.
Hello,
We're running nProbe to dump flow files like this:
/usr/bin/nprobe -i fbcard:0:a06 --verbose 1 --max-log-lines 100000 --dump-path /u01/flow/raw/fbcard-0-a06/ --collector none --dump-format t --dont-nest-dump-dirs --dont-drop-privileges --smart-udp-frags --hash-size 524288 --max-num-flows 1073741823 -V 10
As you can see, we have a fiberblaze card that we are collecting from. Usually, we get files that are nicely sized at around 14MB, but for some reason, nProbe eventually (usually after running for quite a while) gets into a state where it is dumping tons of tiny files, sometimes thousands per second. When this happens, it is creating big problems on the host that is running nProbe, as having millions of files in a single directory makes many things difficult.
Here's an example where nProbe has created over 3000 tiny files in a single second:
I notice that this seems to happen almost exlusively in the 58th and 59th second of the minute, so it seems like it may have to do with the way a boundary condition is being handled in the code.
Here's some output from a script I wrote to try to clean up one of the directories where this happened. It is looking for clusters of files created in a single second, and then catting them together while counting how many there were. The output here is the first filename that came up in the directory listing for that second worth of data, and then the number of files that were dumped out :
So it's definitely looking like something happens when the end of the minute rolls around that is causing this issue.
Some basic version information:
nProbe version info: Version: 9.6.211108 Build OS: Ubuntu 20.04.3 LTS GIT rev: 9.6-stable:d057e2f48cfd28ed7c57b7db18a9228b8a4e6fa6:20211108 Edition: nProbe Pro
If there is more information that I can provide, please let me know.