Closed verzulli closed 2 years ago
The timestamps delivered by libpcap
are wrong. Which libpcap
version are you using?
// EDIT: Seems like an uint64 underflow on my side. Will investigate.
Did you build nDPId
with the OpenWrt Makefile provided in packages/openwrt
?
Did you build
nDPId
with the OpenWrt Makefile provided inpackages/openwrt
?
Yes. I succesfully built the OpenWRT package following this instructions.
(Note: I used the building environment built around the openwrt-sdk-21.02.3-ipq40xx-generic_gcc-8.4.0_musl_eabi.Linux-x86_64.tar
SDK. But I don't think this is much important...)
Strange. I am trying to reproduce this.
Strange. I am trying to reproduce this.
I'm on OWRT 21.02.3. I'll try rebuilding the package "from scratch", once more. Should the behaviour persist, I'll try upgrading to OWRT 22.03.0 (been released early this month).
Will keep you updated.
I just rebuilt FROM SCRATCH the OpenWRT package:
STAGING_DIR
and the PATH
;git clone --depth 50 https://github.com/utoni/nDPId.git
src-link ndpid_testing /usr/local/src/owrt_sdk/nDPId/packages/openwrt
to feeds.conf.default
./scripts/feeds update -a
nDPId-testing
package with ./scripts/feeds install nDPId-testing
make menuconfig
interfacemake -j1 V=sc
I got a nDPId-testing_1.0-5_arm_cortex-a7_neon-vfpv4.ipk
that was slightly different than the previous one (1056936 bytes, while the previous one was 1055494).
I installed it on the OpenWRT box and... behaviour was the same.
BUT....
I also noted that the problem seem to happen when I launch it without the -d
(running it in foreground).
If I run it in background (with -d
), it start the flood but suddenly stop and start working normally.
I also noted that, when running with -d
, the PID assigned to /usr/sbin/nDPId-testing
changes: it seems it forks and die...
root@GW-Big-Brother:~# ps | grep -i ndpid
28538 root 41080 D /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -d -o max-packets-per-flow-to-send=
28540 root 1100 R grep -i ndpid
root@GW-Big-Brother:~# ps | grep -i ndpid
28538 root 76744 R /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -d -o max-packets-per-flow-to-send=
28542 root 1100 S grep -i ndpid
root@GW-Big-Brother:~# ps | grep -i ndpid
28538 root 110m R /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -d -o max-packets-per-flow-to-send=
28544 root 1100 R grep -i ndpid
root@GW-Big-Brother:~# ps | grep -i ndpid
28546 nobody 125m S /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -d -o max-packets-per-flow-to-send=
28564 root 1100 S grep -i ndpid
root@GW-Big-Brother:~# ps | grep -i ndpid
28546 nobody 125m S /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -d -o max-packets-per-flow-to-send=
28566 root 1100 R grep -i ndpid
root@GW-Big-Brother:~# ps | grep -i ndpid
28546 nobody 125m S /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -d -o max-packets-per-flow-to-send=
28568 root 1100 R grep -i ndpid
root@GW-Big-Brother:~# ps | grep -i ndpid
28546 nobody 125m S /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -d -o max-packets-per-flow-to-send=
28570 root 1100 R grep -i ndpid
As you can see, it spawned with 28538
and few seconds after, it became 28546
. Currently, after ~10 mintues, it's still 28546
and it's running fine.
Could the problem be related to the missing of -d
?
Will continue observing the whole things and... I'll keep you updated.
I also noted that the problem seem to happen when I launch it without the
-d
(running it in foreground). If I run it in background (with-d
), it start the flood but suddenly stop and start working normally.
Can you confirm that you'll get those weird timestamps only while running in daemon mode?
Could the problem be related to the missing of
-d
?
Unlikely, because the timestamps are IMHO not related to the daemon mode in any case.
Will continue observing the whole things and... I'll keep you updated.
Btw.: Did you test nDPId-test
in a QEMU VM or on a real devicE?
I tried to build&use nDPId-testing
with the current OpenWrt master branch.
I've build an x86 OpenWrt image with only nDPId-testing
and a *.vdi
target image enabled. I've used that image together with VirtualBox, configured the network in the guest and ran: /usr/sbin/nDPId-testing -d -c 192.168.0.20:7000 -o max-packets-per-flow-to-send=0
. It worked, the timestamps are valid and not broken. Can you try this and see if that works for you? That would probably help finding the error.
Another thing that might help me: Could you upload the OpenWrt build logs of libpcap
and nDPId-testing
?
Btw.: Did you test
nDPId-test
in a QEMU VM or on a real devicE?
I'm running nDPId-test on a REAL device: a GL-iNet GL-B1300 router. I'm running OWRT OpenWrt 21.02.3, r16554-1d4dea6d4f
Can you confirm that you'll get those weird timestamps only while running in daemon mode?
As far as I can see:
nDPId-testing
without the -d
, the problem starts, and persist;nDPId-testing
with the -d
, the problems starts, and few second after, ends. Probably after it changes its PID.But I cannot be sure (I cannot do enough tests to be 100% sure) :-(
Another thing that might help me: Could you upload the OpenWrt build logs of
libpcap
andnDPId-testing
?
I just went through ALL the rebuild-from-scratch process. You can find, below, a terminal transcript of the whole preparation steps (up to make
) and the whole building process (from make
, on)
This just shed a light on me: I ALWAYS uploaded to OWRT ONLY the nDPId-testing
package... and NOT the libpcap1
nor the zlib
ones, as they are already available on my box...
...but they are DIFFERENT versions, right?
I'm going to reinstall all the three packages, and let you know...
I just reinstalled both libpcap1_1.9.1-3.1_arm_cortex-a7_neon-vfpv4.ipk
and nDPId-testing_1.0-5_arm_cortex-a7_neon-vfpv4.ipk
and now everything seems working.
I'll keep checking everything for a few days, with several restart... and let you know the results.
Sorry for being so dumb to not take care to update libpcap as well.... :-(
I just reinstalled both
libpcap1_1.9.1-3.1_arm_cortex-a7_neon-vfpv4.ipk
andnDPId-testing_1.0-5_arm_cortex-a7_neon-vfpv4.ipk
and now everything seems working.
Is that the same libpcap
version that get's shipped with openwrt-sdk-21.02.3-ipq40xx-generic_gcc-8.4.0_musl_eabi.Linux-x86_64.tar
? If so it actually should work without reinstalling libpcap
. This is strange. I will see if I can add an error or warning if the libpcap
version nDPId
was compiled/linked against differs from the runtime version.
I'll keep checking everything for a few days, with several restart... and let you know the results.
Sorry for being so dumb to not take care to update libpcap as well.... :-(
You're not dumb. TBH I dunno what the issue here is or was as well.
Is that the same
libpcap
version that get's shipped withopenwrt-sdk-21.02.3-ipq40xx-generic_gcc-8.4.0_musl_eabi.Linux-x86_64.tar
? If so it actually should work without reinstallinglibpcap
.
Of course, you are right. The libpcap
package is built from OpenWRT sources so... it's exactly what's running inside my OpenWRT box. I was wrong, as I tough about libNPDI
, that it's included from specific source/commit during the build. But as it's included inside nDPId-testing
, it (libNDPI
) doesn't create an ad-hoc .ipk
OWRT package).
Sorry.
Unfortunately, the flood-problem persists.
Yesterday I run the software this way (in foreground, and then with a ^C, stopping them):
root@GW-Big-Brother:~# /usr/sbin/nDPId-testing -i br-lan -c 192.168.0.128:9999 -o max-packets-per-flow-to-send=0
nDPId: No instance alias given, using your hostname 'GW-Big-Brother'
----------------------------------
nDPI version: 4.5.0
API version: 48566
pcap version: 1.9.1 (with TPACKET_V3)
----------------------------------
gcrypt version: 1.8.6internal
----------------------------------
^C------------------------------------ Stopping reader threads
------------------------------------ Processing remaining flows
------------------------------------ Results
Stopping Thread 0, processed 266000 packets, 167033739 bytes
skipped flows.....: 0, processed flows: 168487, idle flows....: 168487
not detected flows: 167031, guessed flows..: 818, detected flows: 641
detection updates.: 238, updated flows..: 1663
Stopping Thread 1, processed 306928 packets, 262903923 bytes
skipped flows.....: 0, processed flows: 102889, idle flows....: 102889
not detected flows: 101397, guessed flows..: 718, detected flows: 775
detection updates.: 144, updated flows..: 979
Stopping Thread 2, processed 18683 packets, 7511917 bytes
skipped flows.....: 0, processed flows: 2431, idle flows....: 2431
not detected flows: 1120, guessed flows..: 627, detected flows: 686
detection updates.: 241, updated flows..: 1541
Stopping Thread 3, processed 23416 packets, 20625367 bytes
skipped flows.....: 0, processed flows: 2401, idle flows....: 2401
not detected flows: 254, guessed flows..: 1652, detected flows: 497
detection updates.: 161, updated flows..: 1229624
Stopping Thread 4, processed 158297 packets, 94580056 bytes
skipped flows.....: 0, processed flows: 99580, idle flows....: 99580
not detected flows: 98089, guessed flows..: 770, detected flows: 721
detection updates.: 275, updated flows..: 1642
Stopping Thread 5, processed 286529 packets, 172160835 bytes
skipped flows.....: 0, processed flows: 192556, idle flows....: 192556
not detected flows: 191591, guessed flows..: 572, detected flows: 394
detection updates.: 136, updated flows..: 899
Stopping Thread 6, processed 148943 packets, 91502090 bytes
skipped flows.....: 0, processed flows: 98657, idle flows....: 98657
not detected flows: 97291, guessed flows..: 627, detected flows: 740
detection updates.: 289, updated flows..: 1809
Stopping Thread 7, processed 156934 packets, 94238252 bytes
skipped flows.....: 0, processed flows: 109650, idle flows....: 109650
not detected flows: 107721, guessed flows..: 1527, detected flows: 402
detection updates.: 140, updated flows..: 732
Stopping Thread 8, processed 240138 packets, 192616304 bytes
skipped flows.....: 0, processed flows: 97885, idle flows....: 97885
not detected flows: 96102, guessed flows..: 750, detected flows: 1034
detection updates.: 252, updated flows..: 1704
Stopping Thread 9, processed 365571 packets, 230981799 bytes
skipped flows.....: 0, processed flows: 227795, idle flows....: 227795
not detected flows: 226409, guessed flows..: 816, detected flows: 574
detection updates.: 160, updated flows..: 1000
Stopping Thread 10, processed 153480 packets, 92991154 bytes
skipped flows.....: 0, processed flows: 101175, idle flows....: 101175
not detected flows: 99869, guessed flows..: 628, detected flows: 680
detection updates.: 284, updated flows..: 4799
Stopping Thread 11, processed 13485 packets, 7893745 bytes
skipped flows.....: 0, processed flows: 6978, idle flows....: 6978
not detected flows: 3949, guessed flows..: 2459, detected flows: 574
detection updates.: 170, updated flows..: 847
Stopping Thread 12, processed 232377 packets, 184852452 bytes
skipped flows.....: 0, processed flows: 96932, idle flows....: 96932
not detected flows: 95650, guessed flows..: 583, detected flows: 700
detection updates.: 253, updated flows..: 1788
Stopping Thread 13, processed 143480 packets, 102131690 bytes
skipped flows.....: 0, processed flows: 86951, idle flows....: 86951
not detected flows: 85477, guessed flows..: 728, detected flows: 747
detection updates.: 142, updated flows..: 732
Stopping Thread 14, processed 265783 packets, 217180620 bytes
skipped flows.....: 0, processed flows: 95751, idle flows....: 95751
not detected flows: 94098, guessed flows..: 912, detected flows: 743
detection updates.: 305, updated flows..: 1769
Stopping Thread 15, processed 8639 packets, 4419421 bytes
skipped flows.....: 0, processed flows: 3262, idle flows....: 3262
not detected flows: 1604, guessed flows..: 1063, detected flows: 602
detection updates.: 142, updated flows..: 878
Total packets captured.......: 2739976
Total packets processed......: 2788683
Total layer4 payload size....: 1943623364
Total flows ignopred.........: 0
Total flows processed........: 1493380
Total flows timed out........: 1493380
Total flows detected.........: 10510
Total flows guessed..........: 15250
Total flows not detected.....: 1467652
Total flow updates...........: 1252406
Total flow detections updates: 3332
root@GW-Big-Brother:~#
100% of related JSONs received by my analyzer have been dumped (by analyzer) to a raw-JSON file. This:
verzulli@XPSGarr tmp]$ ls -l *json*
-rw-r--r-- 1 verzulli verzulli 1297698451 29 set 19.57 raw_json.stream
and inside, there are a whole set of 2.035.351 JSONs, with 1.721.317 status related:
[verzulli@XPSGarr tmp]$ wc -l raw_json.stream
2035351 raw_json.stream
[verzulli@XPSGarr tmp]$ grep '"daemon_event_name":"status"' raw_json.stream | wc -l
1721317
[verzulli@XPSGarr tmp]$
when I asked to my analyzer, it responded this way:
nDPId-rt-analyzer> show counter
=> main `analyzer` counters:
----------
{
"in": 1968312,
"ok": 1968264,
"err": 48,
"evt": {
"flows": 272802,
"pkt": 313,
"others": 1695197
}
}
----------
=> specific `parser` counters:
----------
{
"flows": {
"new": 69266,
"detected": 4473,
"not-detected": 67244,
"detection-update": 3111,
"update": 61537,
"guessed": 1445,
"end": 1296,
"idle": 64430
},
"packets": {
"packet": 313
}
}
----------
=> FLOW TABLE SIZES:
=> Current Global table size...: [ 121626] flows
=> Current Completed table size: [ 65527] flows
----------
started on: Thu Sep 29 2022 18:15:29 GMT+0200 (Ora legale dell’Europa centrale) - uptime: 6095 seconds [102 minutes]
nDPId-rt-analyzer>
nDPId-rt-analyzer> exit
I bet the others": 1695197
are mostly the status events we're discussing about...
Due to size constraint (upload, here, is limited to 25MB), I've created a file with first 100.000 + last 20.000 JSONs. Here it is.
[verzulli@XPSGarr tmp]$ wc -l raw_json.stream.head_100000_plus_tail_20000.stream
120000 raw_json.stream.head_100000_plus_tail_20000.stream
[verzulli@XPSGarr tmp]$
You can find it, zipped, here below raw_json.stream.head_100000_plus_tail_20000.stream.zip
As a side note: if you could expand a little the reasoning about such status
daemon-event and, also, the way they should be fired (you mentioned "timestamp", but... sorry!... I haven't understood :-( ) I'd appreciate :-)
I am highly motivated to find the cause of the issue.
As a side note: if you could expand a little the reasoning about such
status
daemon-event and, also, the way they should be fired (you mentioned "timestamp", but... sorry!... I haven't understood :-( ) I'd appreciate :-)
I am trying to explain what is going on here;
Those daemon
- status
events are sent every interval (default: 600 seconds) to notify on a regular base applications about nDPId
usage e.g. the amount of skipped/active flows, compression statistics and it is planned to sent also details about CPU/Memory.
Since the interval is calculated using timestamps with a microsecond resolution, it is important that those are valid. But this is not the case here. We are getting timestamps that look like as an uint64
underflow happened. All timestamps are coming from libpcap
and are converted by libndpi
. For now, I can not tell if the issue is related to:
libpcap
libndpi
nDPId
So I need your help to figure out whats going wrong here. I will prepare and upload a small patchset ASAP. It would be nice, if you could apply that, recompile nDPId
and upload the resulting logfile.
Hi @verzulli, can you please patch nDPId
and start it with something like:
sudo env TXTLOG=/tmp/bla.log ./nDPId
and upload the logfile here. I guess you do not have to run nDPId
for a long time since from what I saw is that this issue happens right after startup.
Patchfile: nDPId.patch.txt
Here we are. I just built the patched nDPId-testing package and re-run it on my OpenWRT box.
I needed to apply some tricks (namely: declaring a FIFO-file as the local log-target + attaching an ncat
reading instance on it to transfer the content to my local PC.... as I have only few megabytes of local cache available in my BOX), but I'm sure that EVERYTHING WAS OK.
I run the deamon (and the analyzer, dumping the JSONs) and, strangely, everything looked normal, without the flood raising within the first few minutes. I went away and get back after a bit more than two hours and... BINGO, flood were running.
I stopped the daemon and retrieved results.
In the meantime, I collected what your required, inside a ndpid.log
file and the JSON-stream dump (should you need it). As they are both HUGE (1.4G and 4G respectively), I bzipped2 them and sent results to a public
MEGA folder
Probably what you're searching is around line 9.596.563 of ndpid.log
:-)
Can you please try patch nDPId.patch.txt and give me feedback if it fixes your issue.
I took the chance to slightly edit your last patch to still fetch and print the ndpi_timeval_to_microseconds(header->ts)
result, in a dedicated variable time_us_ndpi
, so to clearly check mismatches between your recalculated time_us
and previous time_us_ndpi
value .
The patch is here: ndpid_dv.patch.txt
With this patch in place, I rerun the whole stuff and generate the new log.
As you can see from the attached ndpid.log
included in the attached zip, around line ~1.709.087 you have this:
header->ts..: 1664624094,964175
time_us.....: 1664624094964175
time_us_ndpi: 2145216975
globaltime..: 1664624094964175
header->ts..: 1664624094,964175
time_us.....: 1664624094964175
time_us_ndpi: 2145216975
globaltime..: 1664624094964175
header->ts..: 1664624094,964717
time_us.....: 1664624094964717
time_us_ndpi: 2145217517
globaltime..: 1664624094964717
header->ts..: 1664624098,889104
time_us.....: 1664624098889104
time_us_ndpi: 18446744071563726224
globaltime..: 1664624098889104
header->ts..: 1664624098,889243
time_us.....: 1664624098889243
time_us_ndpi: 18446744071563726363
globaltime..: 1664624098889243
header->ts..: 1664624098,352396
time_us.....: 1664624098352396
time_us_ndpi: 18446744071563189516
globaltime..: 1664624098352396
header->ts..: 1664624098,352538
time_us.....: 1664624098352538
time_us_ndpi: 18446744071563189658
globaltime..: 1664624098352538
header->ts..: 1664624098,352731
time_us.....: 1664624098352731
time_us_ndpi: 18446744071563189851
globaltime..: 1664624098352731
header->ts..: 1664624098,352894
time_us.....: 1664624098352894
time_us_ndpi: 18446744071563190014
globaltime..: 1664624098352894
clearly showing the "jump" of the time_us_ndpi
. So we can surely state that problem arise inside function ndpi_timeval_to_microseconds
.
But we don't care anymore, as --thank to your modification-- calculating time_us
directly, solves the problem.
Thank you for support. Now I can finally get back to my "JSON-stream" analyzer :-)
I will open a PR on https://github.com/ntop/nDPI.git ASAP to fix this issue.
Hi @verzulli,
Can you try to rebuild nDPId-testing
again to verify that the issue was fixed in the nDPI
dev branch.
Hi @verzulli, Can you try to rebuild
nDPId-testing
again to verify that the issue was fixed in thenDPI
dev branch.
Done!
I've rebuilt the OpenWRT package, uploaded and installed it on my OpenWRT box, and run it sending flows to my local linux box.
Currently, after ~99 minutes, these are the results:
nDPId-rt-analyzer> show counter
=> main `analyzer` counters:
----------
{
"in": 46325,
"ok": 46315,
"err": 10,
"evt": {
"flows": 45497,
"pkt": 326,
"others": 502
}
}
----------
=> specific `parser` counters:
----------
{
"flows": {
"new": 7372,
"detected": 7238,
"detection-update": 7219,
"guessed": 154,
"update": 16427,
"not-detected": 18,
"end": 2606,
"idle": 4463
},
"packets": {
"packet": 326
}
}
----------
=> FLOW TABLE SIZES:
=> Current Global table size...: [ 333] flows
=> Current Completed table size: [ 7026] flows
----------
started on: Thu Oct 13 2022 10:51:50 GMT+0200 (Ora legale dell’Europa centrale) - uptime: 5965 seconds [99 minutes]
nDPId-rt-analyzer>
I think you solved! Thanks!
<3
After upgrading
nDPId
on my OpenWRT box (built on commit 36f1786 ), I'm registering an unexpected behaviour: right after launching the daemon with:where
192.168.0.128
is my local notebook, properly listening on UDP/9999, I receive a STORM of daemon-events. In 5 to 10 seconds, I got 16358 JSONs, and only 13 of them where related to "flows". Other 16345 where daemon-related, with ~99% of them "status". Is this an expected behaviour?I'm attached the above 16358 JSONs, should you need to check them. Again: they have been received in ~10 seconds (maybe less) raw_json.zip