netsampler / goflow2

High performance sFlow/IPFIX/NetFlow Collector
BSD 3-Clause "New" or "Revised" License
487 stars 112 forks source link

About the sflow values produced by Goflow #227

Closed zekiahmetbayar closed 1 year ago

zekiahmetbayar commented 1 year ago

Hi,

I'm stuck on something about Goflow's data. I want to track the traffic between two machines according to time, that's why I use goflow collector.

The numbers I get when I transfer a file between these two machines via scp are not consistent with the numbers I get when I traffic using wget, curl, or smb. For example, if I get 10X value for the traffic I do with scp, I get 8X size traffic when I do it with wget, and X size traffic when I do it with curl.

What I do here to find the total traffic is to add up the 'bytes' data in the json produced by goflow. Do I need to do anything additional? Do you think this could be an error caused by the application? Or is it due to sflow's inconsistency?

I'm so confused. I'm looking for someone who can at least create controversy.

Hedius commented 1 year ago

If you are using sampled flows, then you always will have a certain degree of inaccuracy in your data.

scp/ssh, http, smb obviously have different protocol overhead. + keep in mind that certain protocols including ssh and http can utilize compression (e.g gzip) too.

Hedius commented 1 year ago

What you can do is: record the traffic on your machines with tcpdump too and then compare that with your goflow data.

lspgn commented 1 year ago

Hello, Adding to what @Hedius mentioned: the flow sampling could also batch samples on specific interval. If you have little traffic, there can be large differences between the actual number of bytes (eg: obtained with SNMP) and the flow traffic. Sampling is often based on every X packets, so for instance: you could calculate more DNS traffic than HTTP since the latter may have larger packets but less often.

Some devices allow you to sample every packet. sFlow can also provide interface counters but this is out of the scope of GoFlow2.

Are you multiplying the bytes by the sampling rate? Are you doing an average per interval based on timestamp contained inside the flow?

lspgn commented 1 year ago

Will close. Feel free to reopen if you need more information.