ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.22k stars 654 forks source link

nprobe zmq vs PF_RING #367

Closed paoloB132 closed 7 years ago

paoloB132 commented 8 years ago

I'm probing same interface (switch mirror port) eth1 using nprobe zmq and directly ntopng PF_RING.

I confirm that statistical traffic occupation with nProbe is not precise, I understand that this is a milestone forseen in 2.3, where that graphs with PF_RING seem correct.

I notice that moving pointer on graphs to identify top host (Minute Top Traffic Statistic) is really fast with nProbe but terribly slow with PF_RING. Any reason for that ?

lucaderi commented 8 years ago

This is normal as in netflow the number you se are average numbers. This is a duplicate of https://github.com/ntop/ntopng/issues/57 as the root cause is the same,

paoloB132 commented 8 years ago

Ciao Luca.

and what about slow pointer isssue ?

On 28 gen 2016, at 15:33, Luca Deri notifications@github.com wrote:

This is normal as in netflow the number you se are average numbers. This is a duplicate of #57 https://github.com/ntop/ntopng/issues/57 as the root cause is the same,

— Reply to this email directly or view it on GitHub https://github.com/ntop/ntopng/issues/367#issuecomment-176209837.


Paolo Barbato

Consorzio RFX https://www.igi.cnr.it/corso Stati Uniti,4
35127 Padova - Italy
Network Administrator

phone: +39 049 8295097 fax: +39 049 8700718

lucaderi commented 8 years ago

Please quote the issue Id you're referring to

paoloB132 commented 8 years ago

Luca I mean this one 367 ..

"I notice that moving pointer on graphs to identify top host (Minute Top Traffic Statistic) is really fast with nProbe but terribly slow with PF_RING. Any reason for that ? "

lucaderi commented 8 years ago

Sorry I missed this comment. It looks odd to me. @simonemainardi Simone please look at this.

paoloB132 commented 8 years ago

thanks ..

lucaderi commented 8 years ago

@simonemainardi Any news?

simonemainardi commented 8 years ago

@paoloB132 , every mouse move aimed at identifying minute top talkers translates into a sqlite query. I so guess the necessary I/O is preempted by other higher-priority tasks under PF_RING.

Intuitively, the load under PF_RING is much higher since every single packet has to be processed. Conversely, nProbe benefits from aggregated flow data.

I think we should look at your system load under both configurations. Also please post information on the network load. Btw, are you using PF_RING ZC?

paoloB132 commented 8 years ago

Hi Simone,

On 15 feb 2016, at 18:58, simonemainardi notifications@github.com wrote:

@paoloB132 , every mouse move aimed at identifying minute top talkers translates into a sqlite query. I so guess the necessary I/O is preempted by other higher-priority tasks under PF_RING.

Intuitively, the load under PF_RING is much higher since every single packet has to be processed. Conversely, nProbe benefits from aggregated flow data.

I think we should look at your system load under both configurations.

That the point! As soon as I move on the graph when PF_RING is active CPU reach 100% and stay there until I stop. Usually ntop has a very low load. See load with PF_RING active but only dash board accessed:

top - 08:15:12 up 61 days, 23:29, 1 user, load average: 0.80, 0.28, 0.09 Tasks: 184 total, 1 running, 183 sleeping, 0 stopped, 0 zombie Cpu(s): 3.0%us, 3.4%sy, 0.0%ni, 93.5%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 13060068k total, 12746764k used, 313304k free, 398256k buffers Swap: 8191996k total, 8404k used, 8183592k free, 11025732k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6631 nobody 20 0 1348m 141m 13m S 25.9 1.1 1011:46 ntopng
6565 nobody 20 0 435m 10m 3132 S 19.0 0.1 980:28.28 nprobe
15278 root 20 0 15032 1252 888 R 0.3 0.0 0:01.28 top
1 root 20 0 19364 1484 1140 S 0.0 0.0 0:03.19 init

See it after moving some seconds the mouse on the graph:

top - 08:17:21 up 61 days, 23:31, 1 user, load average: 0.82, 0.42, 0.16 Tasks: 184 total, 1 running, 183 sleeping, 0 stopped, 0 zombie Cpu(s): 12.9%us, 17.8%sy, 0.0%ni, 69.1%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 13060068k total, 12745420k used, 314648k free, 398268k buffers Swap: 8191996k total, 8404k used, 8183592k free, 11025816k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6631 nobody 20 0 1348m 141m 13m S 112.7 1.1 1013:34 ntopng
6565 nobody 20 0 435m 10m 3132 S 18.6 0.1 980:51.92 nprobe
9 root 20 0 0 0 0 S 0.3 0.0 7:19.85 ksoftirqd/1
1699 redis 20 0 80904 19m 632 S 0.3 0.2 714:48.94 redis-serve

And yes as you quote, since nProbe graphs bunch of data, in addition to confirming that graphs are somewhat meaningless, no high cpu load is noticed moving mouse on “roller coaster" (lot of peaks) .

So probably top talkers "translation" into sqlite queries with PF_RING records really impact, where this not true with less queries related to nProbe records.

This move me to think that when graphs produced by nProbe will become “meaningful” (more records) we’ll also observe slowdown when moving cursor. Ntopng runs on CentOS 6 :

Also please post information on the network load.

Network load is not high at all, 10/15 % of 1 Gbit/s (this is the uplink) Graph Info Total: 77.1 GB 95th Percentile: 43.82 Mbit Min: 0 bps @ 16/02/2016 08:27:00 Max: 46.12 Mbit @ 15/02/2016 13:54:00

Btw, are you using PF_RING ZC?

No.

Regards, Paolo.

— Reply to this email directly or view it on GitHub.


Paolo Barbato

Consorzio RFX corso Stati Uniti,4
35127 Padova - Italy
Network Administrator

phone: +39 049 8295097 fax: +39 049 8700718

simonemainardi commented 8 years ago

@paoloB132 with PF_RING enabled the system spends ~20% running the kernel, servicing interrupts or managing resources. That's pretty high. Indeed, The system spends a lot of time copying packets from the system to the userland. In PF_RING ZC copies don't occur and this significantly increases system speed (btw, ZC is not compatible with your NICs).

I think it is worth playing a bit with cores affinity. You should try and use affinity to 'reserve' one out of the four cores only to ntopng and see if performances improve.

@cardigliano do you have any other hints?

paoloB132 commented 8 years ago

Thanks Simone, I’ll try to play with you suggestions.

Anyway a good workaround could also, if possible, caching minute top talkers statistics table, for selected Timeframe.

I mean query db and load meaningful records only once at the first access to that particular Timeframe, not using dynamic but hogging realtime access.

Ntopng is a very usefeul tool, and interface graph helps a lot, coupled with top talkers, to quickly identifing traffic "anomalies” related to network peaks.

Regards, Paolo.

On 16 feb 2016, at 10:01, simonemainardi notifications@github.com wrote:

@paoloB132 https://github.com/paoloB132 with PF_RING enabled the system spends ~20% running the kernel, servicing interrupts or managing resources. That's pretty high. Indeed, The system spends a lot of time copying packets from the system to the userland. In PF_RING ZC copies don't occur and this significantly increases system speed (btw, ZC is not compatible with your NICs).

I think it is worth playing a bit with cores affinity. You should try and use affinity to 'reserve' one out of the four cores only to ntopng and see if performances improve.

@cardigliano https://github.com/cardigliano do you have any other hints?

— Reply to this email directly or view it on GitHub https://github.com/ntop/ntopng/issues/367#issuecomment-184585038.


Paolo Barbato

Consorzio RFX https://www.igi.cnr.it/corso Stati Uniti,4
35127 Padova - Italy
Network Administrator

phone: +39 049 8295097 fax: +39 049 8700718

paoloB132 commented 8 years ago

Affinity doesn't help.

runnning ntopng used pid is 10961 10961 nobody 20 0 1219m 67m 13m S 29.3 0.5 0:04.54 ntopng

actually uses all cores [root@ntop ~]# taskset -p -c 10961 pid 10961's current affinity list: 0-3

force on core 0 [root@ntop ~]# taskset -p -c 0 10961 pid 10961's current affinity list: 0-3 pid 10961's new affinity list: 0

..still slow

lucaderi commented 8 years ago

@simonemainardi Any progress?

simonemainardi commented 7 years ago

we have made several improvements recently that should have solved the issue. Please try with the latest version of ntopng and reopen if necessary.