prometheus-community / pro-bing

A library for creating continuous probers
MIT License
309 stars 52 forks source link

Adds support for timeout per packet #45

Closed ivokanchev closed 1 year ago

ivokanchev commented 1 year ago

This is an attempt at solving the timeout per packet issue using the rtt field. The timeout timer is set to the lower value of request timeout / last packet sent timeout so that the pinger exits as soons as there is no more work to do.

Please have a look and share your thoughts.

SuperQ commented 1 year ago

This doesn't look like it does anything about per-packet timeouts. It only sets a timeout at the start of the loop, it doesn't seem to track the individual packet start and timeout times.

ivokanchev commented 1 year ago

This doesn't look like it does anything about per-packet timeouts. It only sets a timeout at the start of the loop, it doesn't seem to track the individual packet start and timeout times.

What do you mean?

if pkt.Rtt >= p.PacketTimeout { // ignore packets that timeout from stats
        return
    }

^^ this is checking each packet against the predefined packet value, therefore statistics are correct. The timeout is calculated so that the process is waiting for the last sent packet + packet timeout time.

SuperQ commented 1 year ago

I don't think you understand what per-packet timeouts are. The timeout you are using is a single timer for the whole packet loop.

Say I do this:

$ ./ping -p 10ms some-far-away-host
PING some-far-away-host (1.2.3.4):
32 bytes from 1.2.3.4: icmp_seq=0 time=130.859026ms ttl=56
32 bytes from 1.2.3.4: icmp_seq=1 time=131.396367ms ttl=56
32 bytes from 1.2.3.4: icmp_seq=2 time=130.668148ms ttl=56
32 bytes from 1.2.3.4: icmp_seq=3 time=129.896707ms ttl=56
32 bytes from 1.2.3.4: icmp_seq=4 time=130.080073ms ttl=56
32 bytes from 1.2.3.4: icmp_seq=5 time=129.952145ms ttl=56
^C
--- some-far-away-host ping statistics ---
6 packets transmitted, 0 packets received, 0 duplicates, 100% packet loss
round-trip min/avg/max/stddev = 0s/0s/0s/0s

Note that the ping RTT in the above example is over 100ms. A per-packet timeout of 10ms should make all of those packets timeout.

Your change does not actually do per-packet timeouts.

ivokanchev commented 1 year ago

As per your example above there is 100% packet loss. I decided to show that the packets are returned, but the deadline is not met.

Each packet is checked against PacketTimeout by reading the Rtt.

Do you refer to the per packet output of the cmd/ping?

ivokanchev commented 1 year ago

@SuperQ I've added a simple label to packets that have timed out here.

metalgrid commented 1 year ago

@SuperQ I believe it works properly and it's a matter of display representation in the cmd/ping program. If you look at the summary at the end from your example, it states 100% packet loss, which means no packets have returned during the specified per-packet deadline.

SuperQ commented 1 year ago

The problem you only account for lost packets on packet receive, and at the end. This needs to be implemented to happen in real-time so that a p.OnPacketTimeout() callback can be registered to happen exactly when each packet times out. Not later.

metalgrid commented 1 year ago

What is the use case for the OnPacketTimeout callback?

SuperQ commented 1 year ago

The primary use case of this repos is it being used as a library in other monitoring projects. The reason we need a per-packet timeout is so that we can collect metrics about timeouts while the prober loop is running. It is essential to this.

ivokanchev commented 1 year ago

Sounds like a viable use-case. What do you think about probing the statistics when needed instead of using a ticker? It would be more performant, unless you need to know the seq of the packet that failed.

SuperQ commented 1 year ago

No, that won't work either. We need this to be as close to real-time as possible.