ntop / nProbe

Open source components and extensions for nProbe
http://ntop.org
GNU General Public License v2.0
1.66k stars 44 forks source link

Sampled data scaling #219

Closed Gunni closed 6 years ago

Gunni commented 7 years ago

We sample netflow on our PE routers at 100 packets to 1 sample, that gives us a reasonable amount of sampled data and it looks and feels good and we use nfsen to alert on issues from it.

nfsen

When we send the same data into nProbe and use the options " --sample-rate @100:1 --upscale-traffic" and the proxy that to ntop ng, we still only see ~15 mbit/s, while nfsen sees ~1.7 Gbit/s (which is the correct value).

Unit file:

[Unit]
Description=nProbe listening on port UDP/3000
After=network.target syslog.target redis.service cluster.service
Requires=pf_ring.service cluster.service

[Service]
Type=simple
ExecStart=/usr/local/bin/nprobe \
    --interface none \
    --collector none \
    --sample-rate @100:1 \
    --upscale-traffic \
    --collector-port 3000 \
    --zmq 'tcp://*:4000' \
    -V 10
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
lucaderi commented 7 years ago

We have reworked the implementation as it was confusing Changes:

  1. Remove --upscale-traffic as I believe if you have set the sampling rate of course you want to upscale
  2. Removed --collector-sample-rate as this was overlapping with --sample-rate
  3. Modified sample rate to ::

So in your case you should do --sample-rate 1:100:1 and remove --upscale-traffic.

Can you please upgrade and sew if it works?

simonemainardi commented 7 years ago

please upgrade now nprobe to the latest build and report

Gunni commented 7 years ago

I can not right now since we abandoned the product for the project i was working on.

But i am doing a new install for another customer soon.

Until then.

Edit: can you also show that command line in a code block, it's confusing with that 💯 icon.

zdc commented 6 years ago

We have the same problem. Tried this in latest stable (8.2.171211-5982) and nightly build (8.3.171214-5996) with option: -S=1:2048:1 It seems that this cause to downscale traffic rates instead upscale it.

simonemainardi commented 6 years ago

@zdc can you please give the full nProbe configuration used as well as the sampling configured in any possible NetFlow/sFlow exporter?

zdc commented 6 years ago

Yes, of course. nProbe:

-n=none
-3=2055
-S=1:2048:1
-V=10
--zmq tcp://127.0.0.1:42055
--zmq-disable-compression

JunOS:

services {
    flow-monitoring {
        version-ipfix {
            template ipv4 {
                flow-active-timeout 60;
                flow-inactive-timeout 60;
                template-refresh-rate {
                    packets 1000;
                    seconds 10;
                }
                option-refresh-rate {
                    packets 1000;
                    seconds 10;
                }
                ipv4-template;
            }
            template ipv6 {
                flow-active-timeout 60;
                flow-inactive-timeout 60;
                template-refresh-rate {
                    packets 1000;
                    seconds 10;
                }
                option-refresh-rate {
                    packets 1000;
                    seconds 10;
                }
                ipv6-template;
            }
        }
    }
}
forwarding-options {
    sampling {
        instance {
            instance-ipfix1 {
                input {
                    rate 2048;
                }
                family inet {
                    output {
                        flow-server 1.1.1.1 {
                            port 2055;
                            version-ipfix {
                                template {
                                    ipv4;
                                }
                            }
                        }
                        inline-jflow {
                            source-address 1.1.1.2;
                        }
                    }
                }
                family inet6 {
                    output {
                        flow-server 1.1.1.1 {
                            port 2055;
                            version-ipfix {
                                template {
                                    ipv6;
                                }
                            }
                        }
                        inline-jflow {
                            source-address 1.1.1.2;
                        }
                    }
                }
            }
        }                               
    }
}
simonemainardi commented 6 years ago

I went through the configurations and the -S=1:2048:1 specified should be correct in your environment. Actually, both IN and OUT bytes and packets should be scaled-up by a factor 2048.

So you mentioned that

It seems that this cause to downscale traffic rates instead upscale it.

do you have any more quantitative information that you can give? Can you tell to what extent this happens? Attaching a pcap of some IPFIX traffic (both templates and records) can help us as well.

zdc commented 6 years ago

I'm tried again and it seems that I have made a mistake with interpretation in first try. Sampling settings is really upscaling, but not as we expected. Here is a example (ntopng Community Edition v.3.3.171214):

-S=1:1:1: nprobe_1 1 1 -S=1:2000:1: nprobe_1 2000 1 In one graph: nprobe_1_2000_compare

As you can see upscaling is only approximately 10x times. Real traffic is approximately 2000x from values that we get with 1:1:1. What can cause this situation? If you need a IPFIX dump for examination, then I can send it to e-mail.

simonemainardi commented 6 years ago

An IPFIX dump will definitely help. Please send it to mainardi ntop org.

in the meanwhile, can you try and disable sampling from the JunOS to see if/how the result changes? Also, if you visit the historical chart page of a local host, which values do you get for the traffic?

zdc commented 6 years ago

With input rate 1 in Juniper and -S=1:1:1 in nprobe result is much closer to correct: nprobe_1 1 1_inputrate1

But CPU in server with ntop is overloaded and it seems that Juniper also can't export flows with this speed. So, I don't think that this is correct results.

Also, if you visit the historical chart page of a local host, which values do you get for the traffic?

Sorry, I don't understand what exactly is meant.

IPFIX dump with sampling rate 2000 I sent you by e-mail.

simonemainardi commented 6 years ago

We have committed a fix, a new build should be available in a new nightly build (8.3). Please, wait an hour for the new build to be generated, give it a try and report. Thanks.

zdc commented 6 years ago

Sorry for late answer. I can confirm that our problem is fixed and this versions work correctly:

ntopng 3.3.180119-4014 nprobe 8.3.180119-6042

Thank you for fix!

simonemainardi commented 6 years ago

thanks for reporting