Closed igorribeiroduarte closed 3 years ago
could you try setting PCAP_NS instead of PCAP as TimestampFormat and see if you are still able to reproduce the issue?
I couldn't say:
Ok got it, let me investigare deeply into this then.
@igorribeiroduarte I was investigating and I realized you are using n2disk1g with napatech, which is not a common configuration as the 1g version does not support chunk-mode. I will investigate on this.
It happened again, but this time I have some logs for you: logs_n2disk.txt
There are some gaps between the logs. I'm not sure about the reason, but could be the container orchestrator (we use docker swarm) having problems to restart n2disk service.
As you can see, the first disk (disco07) seems to be erased after the "Unable to write into file" error and the same happen to the other disks after some minutes.
Also, n2disk seems to have erased that invalid index folders:
ls -hal index_folder
total 16K
drwxr-xr-x 4 user root 4.0K May 13 20:23 .
drwxr-xr-x 3 user docker 4.0K Dec 16 16:00 ..
drwxr-x--- 3 useer docker 4.0K Dec 16 18:02 2019
drwxr-x--- 4 user docker 4.0K May 13 20:24 2020
I had a look at the log, it seems n2disk is failing creating files due to "ERROR: Unable to write into file /disco06/.. [No such file or directory]", that means the folder does not exist at the time it tries to dump the file. You said "the first disk (disco07) seems to be erased", was it completely empty? are you sure is n2disk erasing all the disk?
A more verbose output could also help, please add -v
As you can see below, the only folders n2disk didnt' erase are 1585290220.750124
and 1586538427.963111
, and this happened because these both folders have a remaining .tmp
file and n2disk only erases .pcap
files, right?
ls -hal /disco05/
total 1.5M
drwxr-xr-x 4 user root 124K May 14 08:39 .
drwxr-xr-x 31 root root 4.0K Dec 26 15:06 ..
drwxr-x--- 2 user docker 1.2M May 14 08:39 1589412259.992738
drwxr-x--- 2 user docker 268K May 14 11:44 1589456325.900914
ls -hal /disco06/
total 2.4M
drwxr-xr-x 5 user root 124K May 14 08:39 .
drwxr-xr-x 31 root root 4.0K Dec 26 15:06 ..
drwxr-x--- 2 user docker 952K Apr 13 09:37 1585290220.750124
drwxr-x--- 2 user docker 1.1M May 14 08:39 1589412263.883094
drwxr-x--- 2 user docker 256K May 14 11:44 1589456326.330455
ls -hal /disco07/
total 166M
drwxr-xr-x 5 user root 112K May 14 08:39 .
drwxr-xr-x 31 root root 4.0K Dec 26 15:06 ..
drwxr-x--- 2 user docker 552K Apr 13 09:14 1586538427.963111
drwxr-x--- 2 user docker 1.1M May 14 08:39 1589412266.310233
drwxr-x--- 2 user docker 256K May 14 11:44 1589456328.376376
Also, what I said about disco07 being erased was based on n2disk storage logs before and after the "Unable to write into file error" : These logs below are showed after the restart that happened due to the "Unable to write into file error" of disco07
13/May/2020 20:01:22 [n2disk.c:2680] Storage /disco07: 0.00 GB in use
13/May/2020 20:01:24 [n2disk.c:2680] Storage /disco06: 11839.78 GB in use
13/May/2020 20:01:25 [n2disk.c:2680] Storage /disco05: 13181.03 GB in use
And these are showed after the restart that happened due to the same error but with disco05 and disco06 (At 20:23:39 and at 20:23:59):
13/May/2020 20:24:28 [n2disk.c:2680] Storage /disco05: 0.00 GB in use
13/May/2020 20:24:28 [n2disk.c:2680] Storage /disco07: 0.00 GB in use
13/May/2020 20:24:28 [n2disk.c:2680] Storage /disco06: 0.00 GB in use
I can change the log level, but it will take a lot for this problem to happen again, because it only happens after the disks are full
@cardigliano , It happened again. Now I have more verbose logs:
On this first log, you can see at the line 17 that disco07 had 13197.50GB used at 18:00, and after the cleanup only 7613.68GB remained. Almost half of the pcaps were deleted. disco07_from_13T_to_7T.txt
Now, on this second log, you can see at the line 6 that disco05 had 13198GB used at 22:09, disco06 had 13141GB (line 17) and disco07 had 8018G (line 466). At the end of the log, disco07 and disco05 were completely erased and disco06 had only 4124GB of pcaps remaining; disco05_from13T_to_0T.txt
On this second log, I can also see that the n2disk container died during the cleanup, possibly due to some instability on our stack, but I don't think this should be making n2disk delete all these pcaps.
I pushed a fix that could address this, and added more info to the logs to check what is causing this in case it happens again. A new build will be available soon.
It happened again even after updating:
Update: a workaround has been provided for this to avoid affecting file rotation, however it seems this was due to bad timestamps and we are still investigating.
I'm facing the following problem with n2disk1g v.3.4.200207 (r5184) : For some reason I still don't know, part of my traffic read by n2disk is coming with invalid timestamps, for example: "4102363817.1799190". I confirmed this is the received timestamp with wireshark. This dues to some n2disk indexes being created on a wrong way, with a future date:
At first, this wouldn't be a problem and I could just ignore those invalid indexes, since they rarely happen. But, sometimes, all my pcaps (42TB of data) are deleted and I believe this is being done by n2disk during it's automatic rotate. I'd like to know if these pcaps being deleted could be related with the timestam problem. I imagine that depending on the way this rotate is done, if n2disk deletes all indexes previous to 2099, for example, this could cause this lost of data.
I'm using the following command to run n2disk:
n2disk1g -I -P /var/run/n2disk/n2disk.pid -A index_folder -p 1024 -b 1024 -i nt:stream0 -n 5000 -m 5000 --disk-limit 93% -t 15 -o /disco05 -o /disco06 -o /disco07
My ntservice configuration file: