ntop / nProbe

Open source components and extensions for nProbe
http://ntop.org
GNU General Public License v2.0
1.62k stars 44 forks source link

NProbe Proxy Mode Oddness #298

Closed githubuser9999999 closed 5 years ago

githubuser9999999 commented 5 years ago

I would like to use NProbe in proxy mode. But first, I would like to make sure that I trust the data that NProbe is sending out. To do this, I set a switch that is sending netflow directly to a collector to also export the same exact flows to NProbe which then sends those flows on to that same collector. I am comparing the two exports from the collector side by side, and there are vast differences in volume of traffic reported. For example, in the past hour, the switch is reporting a volume of 103GB while NProbe is reporting 34GB.

This is the nprobe config file:

-n=udp://10.10.3.2:9996 -i=none -t=60 -d=60 -a=0 -e=1 -B=10 -w=128000 -z=0 -S=1:1:1 -E=0:0 -g=/var/run/nprobe-none.pid -3=9996 --vlanid-as-iface-idx=none -T=%FIRST_SWITCHED %INPUT_SNMP %IN_BYTES %IN_PKTS %IPV4_DST_ADDR %IPV4_SRC_ADDR %L4_DST_PORT %L4_SRC_PORT %LAST_SWITCHED %PROTOCOL %SRC_TOS %TCP_FLAGS -V=9 --dump-stats=/var/log/nprobe/none-0_flows_stats.txt

Also, it seems that in proxy mode, no changes to the export format or export policy have any affect on the netflow output that Nprobe generates. Is this correct? The only thing that seems to affect the output is the output flow type (V5, V9, IPFIX...)

simonemainardi commented 5 years ago

What is the nprobe version you are using? Are you using the latest stable/dev?

Do you have a valid license? Note that, in demo mode, nProbe only exports the first 25k flows and then stops.

Try and add option --disable-cache.

Also, it seems that in proxy mode, no changes to the export format or export policy have any affect on the netflow output that Nprobe generates. Is this correct? The only thing that seems to affect the output is the output flow type (V5, V9, IPFIX...)

V5 format is fixed as per RFC. Template changes (-T) will only affect versions V9 and IPFIX.

githubuser9999999 commented 5 years ago

Below is the output from nprobe -v which shows the version and license info. I have an evaluation license, and I was under the impression that this would not limit the amount of flows. Please let me know if this is untrue.

I am trying to export V9 and have notices that changes to the template in proxy mode don't seem to affect the template in the packet captures.

Welcome to nProbe v.8.6.181012 (r6309) for x86_64-pc-linux-gnu with native PF_RING acceleration. Copyright 2002-18 ntop.org

Build OS: Ubuntu 18.04.1 LTS SystemID: redacted GIT rev: 8.6-stable:9a082405d50d567ed81e988ffb3f3971de989407:20181012 Edition: nProbe Pro License: redacted [valid license] License Type: Time-limited License Lic. Duration: Until Fri Nov 2 18:54:10 2018 [18 days left]

nProbe is subject to the terms and conditions defined in the LICENSE and EULA files that are part of this package.

nProbe also contains third party code: Radix tree code - (C) The Regents of the University of Michigan ("The Regents") and Merit Network, Inc. sFlow collector - (C) InMon Inc.

githubuser9999999 commented 5 years ago

I just re-read what you wrote. Nprobe is not stopping after 25k flows. It continues to run and under-reports the traffic volume.

simonemainardi commented 5 years ago

Did you add the --disable-cache as suggested? On average, how many flows per second are you trying to collect?


Below is the output from nprobe -v which shows the version and license info. I have an evaluation license, and I was under the impression that this would not limit the amount of flows. Please let me know if this is untrue.

The license is OK and no limit on flows export is enforced with that license. I would expect all the flows to be exported.

I am trying to export V9 and have notices that changes to the template in proxy mode don't seem to affect the template in the packet captures.

This is normal, nProbe can't decide the template of received flows. Only the upstream exporter can decide that template. In proxy mode, nProbe has only control over the downstream NetFlow (v9 in your case) it exports. So for example, if you add % TCP_FLAGS to the nProbe template, but the upstream exporter doesn't send it, nProbe will leave it empty also downstream.


I need to understand if there are some NetFlow packets lost between the upstream exporter (-3=9996) and nProbe, or between nProbe and the downstream collector (-n=udp://10.10.3.2:9996).

Can you try and connect nProbe to ntopng to see if the reported traffic volume is reported correctly there? See https://www.ntop.org/nprobe/network-monitoring-101-a-beginners-guide-to-understanding-ntop-tools/

Please, also run nProbe with option -b=2 to see NetFlow collection stats and report the output here, e.g.,

15/Oct/2018 17:28:00 [nprobe.c:3313] L7 Proto                   Diff      Total
15/Oct/2018 17:28:00 [nprobe.c:3327]    Unknown/0                   0 B    5.82 KB
15/Oct/2018 17:28:00 [nprobe.c:3335] Current flow export rate: [0.0 flows/sec]
15/Oct/2018 17:28:00 [nprobe.c:3338] Flow drops: [export queue too long=0][too many flows=0][ELK queue flow drops=0]
15/Oct/2018 17:28:00 [nprobe.c:3343] Export Queue: 0/524288 [0.0 %]
15/Oct/2018 17:28:00 [nprobe.c:3348] Flow Buckets: [active=0][allocated=0][toBeExported=0]
15/Oct/2018 17:28:00 [nprobe.c:3385] Collector Threads: [2 pkts@0]
15/Oct/2018 17:28:00 [nprobe.c:3124] Processed packets: 0 (max bucket search: 0)
15/Oct/2018 17:28:00 [nprobe.c:3107] Fragment queue length: 0
15/Oct/2018 17:28:00 [nprobe.c:3134] Flow collection stats:  [collected pkts: 2][processed flows: 4]
15/Oct/2018 17:28:00 [nprobe.c:3137] Flow export stats:      [0 bytes/0 pkts][0 flows/0 pkts sent]
15/Oct/2018 17:28:00 [nprobe.c:3143] Flow export drop stats: [0 bytes/0 pkts][0 flows]
15/Oct/2018 17:28:00 [nprobe.c:3148] Total flow stats:       [0 bytes/0 pkts][0 flows/0 pkts sent]
githubuser9999999 commented 5 years ago

I added --disable-cache, and it doesn't seem to have any affect on the output.

Maybe I used the wrong terminology. When I analyze a packet capture of the flows on the collector that were sent by nprobe, the flow records themselves don't seem to be affected by changes to the flow export format.

I believe that the flow count is ~10K/Sec

I have ntopng installed as well, but I cant see where it is reporting anything based on nprobe proxy mode. It only seems to be reporting on SPAN traffic. Am I missing something?

githubuser9999999 commented 5 years ago

And here are those stats that you asked for.

15/Oct/2018 18:37:27 [nprobe.c:6321] nProbe is shutting down... 15/Oct/2018 18:37:27 [nprobe.c:6345] Exporting pending buckets... 15/Oct/2018 18:37:27 [nprobe.c:6309] Flushing active flows 15/Oct/2018 18:37:27 [nprobe.c:6372] Pending buckets have been exported... 15/Oct/2018 18:37:29 [engine.c:3803] Export thread terminated [exportQueue=0] 15/Oct/2018 18:37:29 [nprobe.c:6467] Flushing queued flows... 15/Oct/2018 18:37:29 [nprobe.c:6482] Freeing memory... 15/Oct/2018 18:37:29 [plugin.c:294] Terminating plugins. 15/Oct/2018 18:37:29 [nprobe.c:6602] Still allocated 0 hash buckets 15/Oct/2018 18:37:29 [nprobe.c:3127] Processed packets: 0 (max bucket search: 4) 15/Oct/2018 18:37:29 [nprobe.c:3110] Fragment queue length: 0 15/Oct/2018 18:37:29 [nprobe.c:3137] Flow collection stats: [collected pkts: 12538][processed flows: 136318] 15/Oct/2018 18:37:29 [nprobe.c:3140] Flow export stats: [3764005542 bytes/4642838 pkts][136318 flows/8309 pkts sent] 15/Oct/2018 18:37:29 [nprobe.c:3146] Flow export drop stats: [0 bytes/0 pkts][0 flows] 15/Oct/2018 18:37:29 [nprobe.c:3151] Total flow stats: [3764005542 bytes/4642838 pkts][136318 flows/8309 pkts sent] 15/Oct/2018 18:37:29 [nprobe.c:6611] Cleaning globals 15/Oct/2018 18:37:29 [nprobe.c:6630] nProbe terminated.

simonemainardi commented 5 years ago

I have ntopng installed as well, but I cant see where it is reporting anything based on nprobe proxy mode. It only seems to be reporting on SPAN traffic. Am I missing something?

See https://www.ntop.org/guides/ntopng/case_study/using_with_nprobe.html for configuration instructions.

githubuser9999999 commented 5 years ago

OK I figured out how to get ntongg reporting on the collected flows per the document you referenced above, and the totals agree with what I'm seeing with the flows that are forwarded to my external collector. They are consistently lower my a large margin.

simonemainardi commented 5 years ago

Thank you for reporting. I want to see if the UDP socket receive queue is getting full, eventually dropping incoming netflow.

Please execute this:

[simone@develv5 ntopng]$ cat /proc/net/udp

And report the output. Thank you.

githubuser9999999 commented 5 years ago

cat /proc/net/udp sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
2393: 00000000:904B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 36293 2 0000000000000000 0
8218: 00000000:270C 00000000:0000 07 00000000:00000000 00:00000000 00000000 999 0 56280 2 0000000000000000 791485
14659: 3500007F:0035 00000000:0000 07 00000000:00006600 00:00000000 00000000 101 0 22828 2 0000000000000000 0
14767: 00000000:00A1 00000000:0000 07 00000000:00002E40 00:00000000 00000000 0 0 36295 2 0000000000000000 0

simonemainardi commented 5 years ago

is the drops number 791485 increasing?

githubuser9999999 commented 5 years ago

Yes. It's now at 799928. What can cause this, and What can I look at to prevent it?

simonemainardi commented 5 years ago

The volume of NetFlow is so high that nProbe can't dequeue NetFlow UDP packets fast enough. However, this happens at flow-per-second rates much higher than 10k flow per second. So I am quite surprised. What are the specs of the machine you are using?

Can you report a larger piece of nProbe log (journalctl -u nprobe)?

Let's also try and increase the nProbe configuration -w=128000 tenfold. Then report if you are still seeing a lower amount of traffic. If that doesn't solve, you should consider spawning 2 nprobe and balance the NetFlow among them.

githubuser9999999 commented 5 years ago

The box has 2 x Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz and 32 GB of RAM.

I was able to get the buffer drops to stop by following this guide:

https://medium.com/@CameronSparr/increase-os-udp-buffers-to-improve-performance-51d167bb1360

However, the traffic still does not match. The hash size is already set to 128000. Did you mean to add another zero on the end of that?

sudo journalctl -u nprobe -- Logs begin at Fri 2018-10-05 18:52:08 UTC, end at Wed 2018-10-17 18:33:14 UTC -- No entries --

Not much in that log file...

githubuser9999999 commented 5 years ago

A followup. I stopped the nprobe process and ran Samplicator against the same flows and forwarded it to the same collector. The graphs match up perfectly now. This tells me that something is amiss with nprobe.

simonemainardi commented 5 years ago

I was able to get the buffer drops to stop by following this guide: https://medium.com/@CameronSparr/increase-os-udp-buffers-to-improve-performance-51d167bb1360

Increasing the buffers in not a solution as packets will stay in the (larger buffer) rather than being processed. We need to understand why nprobe can't receive fast-enough.

Samplicator against the same flows and forwarded it to the same collector. The graphs match up perfectly now. This tells me that something is amiss with nprobe.

ok thanks for this test

However, the traffic still does not match. The hash size is already set to 128000. Did you mean to add another zero on the end of that?

yes

Not much in that log file...

Try and run nprobe in the foreground. Stop the one running as daemon and then

/usr/local/bin/nprobe /etc/nprobe/nprobe.conf

then report the output.

Finally, would you be able to record a capture of the incoming netflow ad attach it here as a zip file? So I can try and reproduce in the lab.

githubuser9999999 commented 5 years ago

-OK. I removed the buffer change.

-Changing the hash size did nothing to help.

-Here is the output when running from CLI.

22/Oct/2018 18:44:26 [plugin.c:181] No plugins found in ./plugins 22/Oct/2018 18:44:26 [plugin.c:189] Loading 24 plugins [.so] from /usr/local/lib/nprobe/plugins 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin DHCP Protocol [/etc/nprobe.license.dhcp]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin Diameter Protocol [/etc/nprobe.license.diameter]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin DNS/LLMNR Protocol [/etc/nprobe.license.dns]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin Export Plugin [/etc/nprobe.license.export]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin FTP Protocol [/etc/nprobe.license.ftp]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin GTPv0 Signaling Protocol [/etc/nprobe.license.gtpv0]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin GTPv1 Signaling Protocol [/etc/nprobe.license.gtpv1]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin GTPv2 Signaling Protocol [/etc/nprobe.license.gtpv2]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin HTTP Protocol [/etc/nprobe.license.http]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin IMAP Protocol [/etc/nprobe.license.email]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin Netflow-Lite Plugin [/etc/nprobe.license.nflite]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin Oracle Protocol [/etc/nprobe.license.oracle]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin POP3 Protocol [/etc/nprobe.license.email]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin Radius Protocol [/etc/nprobe.license.radius]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin RTP Plugin [/etc/nprobe.license.voip]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin S1AP Protocol [/etc/nprobe.license.S1AP]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin SIP Plugin [/etc/nprobe.license.voip]: Missing license file 22/Oct/2018 18:44:26 [plugin.c:857] Unable to enable plugin SMTP Protocol [/etc/nprobe.license.email]: Missing license file 22/Oct/2018 18:44:26 [nprobe.c:4164] Valid nProbe Pro license found 22/Oct/2018 18:44:26 [nprobe.c:3729] Exporting flows towards udp://10.201.3.2:9996 using UDP 22/Oct/2018 18:44:26 [nprobe.c:5285] WARNING: If you want to preserve the -M value, please specify -w before -M 22/Oct/2018 18:44:26 [nprobe.c:6086] WARNING: The output interfaceId is set to 0: did you forget to use -Q perhaps ? 22/Oct/2018 18:44:26 [nprobe.c:6089] WARNING: The input interfaceId is set to 0: did you forget to use -u perhaps ? 22/Oct/2018 18:44:26 [nprobe.c:6176] Welcome to nProbe Pro v.8.6.181017 ($Revision: 6309 $) for x86_64-pc-linux-gnu with native PF_RING acceleration 22/Oct/2018 18:44:26 [nprobe.c:6186] Running on Ubuntu 18.04.1 LTS 22/Oct/2018 18:44:26 [nprobe.c:6197] [LICENSE] nProbe SystemId: 3F79CA5EB205A206 22/Oct/2018 18:44:26 [nprobe.c:6264] Sample rate [packet: 1][flow collection/export: 1/1] 22/Oct/2018 18:44:26 [nprobe.c:8956] Welcome to nProbe v.8.6.181017 for x86_64-pc-linux-gnu 22/Oct/2018 18:44:26 [nprobe.c:7966] Using NetFlow Packet Payload Len: 1472 22/Oct/2018 18:44:26 [plugin.c:1238] 0 plugin(s) enabled 22/Oct/2018 18:44:26 [nprobe.c:8412] Each flow is 64 bytes long 22/Oct/2018 18:44:26 [nprobe.c:8413] The # flows per packet has been set to 22 22/Oct/2018 18:44:26 [nprobe.c:8416] IP TOS is accounted 22/Oct/2018 18:44:26 [nprobe.c:8442] Non IPv4/v6 traffic is discarded according to the template 22/Oct/2018 18:44:26 [util.c:507] Loaded database /usr/share/ntopng/httpdocs/geoip/GeoLite2-ASN.mmdb [ip_version: 6] 22/Oct/2018 18:44:26 [nprobe.c:9322] Not capturing packet from interface (collector mode) 22/Oct/2018 18:44:26 [nprobe.c:8793] ERROR: Unable to store PID in file /var/run/nprobe-none.pid 22/Oct/2018 18:44:26 [util.c:3765] Privileges are not dropped as we're not superuser 22/Oct/2018 18:44:26 [collect.c:142] Flow collector listening on port 9996 (IPv4/v6) 22/Oct/2018 18:44:26 [nprobe.c:9568] nProbe started successfully

As for a packet capture, I'd gladly provide you one in a private setting. Can you PM me and I'll send you a link?

githubuser9999999 commented 5 years ago

Upon shutdown, this is what I got. In case it helps. There is a mention of the bucket search being too slow, but it didn't report any drops?

C22/Oct/2018 18:50:53 [nprobe.c:567] Received shutdown request... [signal: 2] 22/Oct/2018 18:50:53 [nprobe.c:6309] Flushing active flows 22/Oct/2018 18:50:55 [nprobe.c:3127] Processed packets: 0 (max bucket search: 11) 22/Oct/2018 18:50:55 [nprobe.c:3110] Fragment queue length: 0 22/Oct/2018 18:50:55 [nprobe.c:3133] WARNING: Your bucket search is too slow (11): expect drops 22/Oct/2018 18:50:55 [nprobe.c:3137] Flow collection stats: [collected pkts: 55107][processed flows: 1266842] 22/Oct/2018 18:50:55 [nprobe.c:3140] Flow export stats: [975420502 bytes/41840285 pkts][1249538 flows/67854 pkts sent] 22/Oct/2018 18:50:55 [nprobe.c:3146] Flow export drop stats: [0 bytes/0 pkts][0 flows] 22/Oct/2018 18:50:55 [nprobe.c:3151] Total flow stats: [975420502 bytes/41840285 pkts][1249538 flows/67854 pkts sent]

simonemainardi commented 5 years ago

Tanks for offering this support. You can pm me at: mainardi at ntop dot org

githubuser9999999 commented 5 years ago

Sent you a link...

simonemainardi commented 5 years ago

So it turned out that there was an issue with the bucket search that could have caused drops.

22/Oct/2018 18:50:55 [nprobe.c:3133] WARNING: Your bucket search is too slow (11): expect drops

We've done a fix using the pcap provided. Please, install the latest nprobe 8.7 dev and let us know if you are still seeing drops.

githubuser9999999 commented 5 years ago

Thank you. Can you send me the links for your dev repositories for Ubuntu?

simonemainardi commented 5 years ago

http://packages.ntop.org/apt/

simonemainardi commented 5 years ago

@githubuser9999999 did you have a chance to try?

githubuser9999999 commented 5 years ago

Yes. It seems to have taken care of the missing traffic. Thanks.

simonemainardi commented 5 years ago

thanks for reporting.