ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.23k stars 654 forks source link

datarate on disk #740

Closed Niko78 closed 8 years ago

Niko78 commented 8 years ago

I use nprobe/ntop as netflow (v9) collector I have around 45 routers (60k Hosts, 150k flows) which send their netfow stream on this server Bandwidth usage (input) on ntop server is less than 200 Kbps... but ntop writes at more than 20MBps still on disk.

I have a second server with 400 routers with less hosts/flows... bandwidth usage is 500 kbps but datarate on disk is more than 50MBps !

we planned to migrate to SSD disks soon but I don't understand how is it possible for ntop to receive at 200/500 Kbps and write at 20MBps (or even more than 50MBps) on disk ?

thanks

nico

simonemainardi commented 8 years ago

Please enclose proof that is actually ntopng that is writing at 20/50Mbps on disk.

However, among the 27 issues you have already filed so far (https://github.com/ntop/ntopng/issues?utf8=%E2%9C%93&q=author%3ANiko78) there is this one (https://github.com/ntop/ntopng/issues/628#issuecomment-229033070) that shows how you define one huge /16 local network. My feeling is that you have 60k local hosts (maybe also with L7 statistics enabled) and this is causing intensive I/O to disk.

Niko78 commented 8 years ago

L7 was disabled I let you see my screenshot (less hosts/flows but high IO)

ntop

lucaderi commented 8 years ago

@Niko78 Can you please check on the nProbe side how many flows are sent to nProbe? I suggest you to add -b 1 (or even -b 2) on nProbe and see. I see flow drops on ntopng: can you please create a virtual interface per exporter router/nProbe so that you can leverage on the multithreading ntopng capabilities?

Niko78 commented 8 years ago

I hope it is what you wanted, I let nprobe launched during 1 minute

13/Sep/2016 14:30:10 [nprobe.c:2718] Processed packets: 4414 (max bucket search: 0) 13/Sep/2016 14:30:10 [nprobe.c:2701] Fragment queue length: 0 13/Sep/2016 14:30:10 [nprobe.c:2727] Flow export stats: [2166136 bytes/4409 pkts][102 flows/4 pkts sent] 13/Sep/2016 14:30:10 [nprobe.c:2737] Flow drop stats: [0 bytes/0 pkts][0 flows] 13/Sep/2016 14:30:10 [nprobe.c:2742] Total flow stats: [2166136 bytes/4409 pkts][102 flows/4 pkts sent] 13/Sep/2016 14:30:10 [nprobe.c:5390] Cleaning globals 13/Sep/2016 14:30:10 [nprobe.c:5409] nProbe terminated.

Niko78 commented 8 years ago

@lucaderi I sent you by email "strace" information" from ntop

simonemainardi commented 8 years ago

I looked at the traces and I have the following suggestions. Try both of them independently, and see if disk IO rates change.

  1. remove line https://github.com/ntop/ntopng/blob/dev/scripts/callbacks/minute.lua#L168 from the installed script (it should be under /usr/share/ntopng/scripts/callbacks/minute.lua), save it, and see...
  2. disable mysql by removing -F, and see.
Niko78 commented 8 years ago

wow ! fantastic !

I try in disabling both => no peaks at all and only few KB written sometimes I try in re-enabling mysql => no issue

so it seems you found my problem with 1)

simonemainardi commented 8 years ago

@emanuele-f can you please check if/why disk usage peaks with the activities? also, should we execute those activities once every 5 minutes (rather than once every minute)? I guess it should help.

emanuele-f commented 8 years ago

I'm doing a quick test with iotop, with that line enabled I see peaks of 1170 KB/s while with line disabled I see peaks of 850 KB/s. But I only have 2 hosts to test with. There are 12 activity files to be updated per host each minute, so for 60k hosts may be too much RRD work...

Note: (1170-850) * 60000 = 18.31 MBps, which is near 20MBps

Niko78 commented 8 years ago

on my side difference is enormous now even peaks are less than 200 KBps and average should be around 50 KBps (4k host and 45k flows during the check)

emanuele-f commented 8 years ago

@Niko78 local users activity logging to disk is now disabled by default. Moreover, activity timeseries frequency has been reduced from 1 minute to 5 minutes. This should drastically reduce disk activity on big networks like yours. Please checkout latest code and report your results.

Note: Activity logging preferences can be specified through the "On-Disk Timeseries" tab.

Niko78 commented 8 years ago

@emanuele-f I can confirm now it is fixed I let it disabled on my side, now I have only few KB written on disk thanks

emanuele-f commented 8 years ago

Thank you.

simonemainardi commented 8 years ago

Thanks for reporting.