prometheus-community / pro-bing

A library for creating continuous probers
MIT License
309 stars 52 forks source link

Packet Loss is (arguably) calculated incorrectly #13

Open sector-f opened 1 year ago

sector-f commented 1 year ago

Responses to network requests are never instantaneous.

This library calculates packet loss by dividing the number of packets that have been sent and the number of packets that have been received. While I cannot say that the resulting number is "objectively incorrect," this does not take into account the fact that some delay between request and response should be expected.

For example, in the instant after sending our first packet, we have sent 1 packet and received 0 packets. This is technically 100% packet loss at this moment in time, but actually construing this as 100% packet loss probably isn't helpful.

I think this would be more useful if the number of sent packets used for this calculation were only incremented some (user-configurable?) time after the packet has actually been sent.

While it was straightforward to create my own wrapper with this functionality (using the OnSend and OnRecv callbacks), I'm wondering if anyone thinks it would make sense to add this functionality into the library itself.

SuperQ commented 1 year ago

The only way for this to be sensible is if we implement per-packet timeouts. With the current implementation, we have an unlimited amount of time that we could wait for packets to return.

Besides this, I don't know of any other system that would calculate things the way you're suggesting. Since latency can vary wildly, there is no delay expectation that would be reasonable.

mpenning commented 9 months ago

SuperQ said:

there is no delay expectation that would be reasonable.

May I offer a suggestion?

Instead of saying there is no delay expectation that is reasonable, it's more appropriate to say "only the owner of the network / application can know the maximum delay expectation".

When pinging through the internet, your statement is somewhat true, but even so nobody would ever tolerate a 40-second delay (picking 40 seconds as an absurd, but actual measurement I have witnessed under pathological conditions). As such, a good approach is requiring the user to specify a maximum delay.

After checking some recent PRs, it looks like PR #49: add timeout handler and max rtt option is relevant to the OP's question.

wz2b commented 7 months ago

I have some issues with this as well. It might be possible to tune -w and -W correctly but I haven't figured out how yet. The result is that this plugin always gives me far worse results than pinging from the same host from the command line. It is perplexing and I haven't gotten to the bottom of it yet. Watching this ticket.

wz2b commented 7 months ago

The only way for this to be sensible is if we implement per-packet timeouts. With the current implementation, we have an unlimited amount of time that we could wait for packets to return.

When you request a certain number of packets, it sends -c, -w, and -W. If you don't specify a long enough -w (which the plugin calls deadline) then you might have gotten all the responses back, you just didn't wait long enough. The other thing that is confusing is -W (timeout) is only used if there are no responses. if there are responses, then that is ignored and it actually waits two times some RTT that it calculates - presumably from earlier pings, maybe from only the last one.

I think the fundamental problem is how dynamic the 'ping' command line tool is, then 'native' tries to mimic that and brings in the same problems.

I repeatedly get way worse results from this plugin than I get running ping from the command line on the same host. I think the reason for this is ping doesn't behave very well with -c, -w, -W when you have high variability in response time. I'm not sure how I would fix this. I think a total rewrite of 'native' could fix it, but it would want to take on a different strategy, such as launching -c n pings -W t seconds apart, then wait until either all of them come back or -w has passed, and count.