prometheus-community / pro-bing

A library for creating continuous probers
MIT License
309 stars 52 forks source link

add timeout handler and max rtt option #49

Open zey1996 opened 1 year ago

zey1996 commented 1 year ago

I believe that developers should have a way to know when a data packet is lost or does not return within a specified time.

you can set MaxRtt and OnTimeout func

Ontime func while be call when a request was not answered within a specified time

you only need add:

  pinger.MaxRtt = time.Second * 3

  pinger.OnTimeOut = func(packet *probing.Packet) {
    fmt.Println("timeout", packet.Addr, packet.Rtt, packet.TTL)
  }

It has good compatibility, as if MaxRtt is not set, it will have the same behavior as before.

However, there are still some issues remaining.

But I believe that this is acceptable.

zey1996 commented 1 year ago

I noticed have some memory problem. I think this pr can fix it. the user should set timeout to resolve memory problem.

zey1996 commented 1 year ago

https://github.com/prometheus-community/pro-bing/blob/2dcd76c30c10636979f44f178ed9538e5a6e7043/ping.go#LL714C17-L714C17 why not return a error when uuid is not exist?

wz2b commented 7 months ago

I wrote this comment in a related issue but I think I should repeat it here. I think I get the issue here, but I'm wondering if the solution is to go farther, and not just have the 'native' pinger try to replicate /usr/bin/ping.

If I interpret the man page correctly, when you request a certain number of packets, it sends -c, -w, and -W. If you don't specify a long enough -w (which the plugin calls deadline) then you might have gotten all the responses back, you just didn't wait long enough. The other thing that is confusing is -W (timeout) is only used if there are no responses. if there are responses, then that is ignored and it actually waits two times some RTT that it calculates - presumably from earlier pings, maybe from only the last one.

I think the fundamental problem is how dynamic the 'ping' command line tool is, then 'native' tries to mimic that and brings in the same problems.

I repeatedly get way worse results from this plugin than I get running ping from the command line on the same host. I think the reason for this is ping doesn't behave very well with -c, -w, -W when you have high variability in response time. I'm not sure how I would fix this. I think a total rewrite of 'native' could fix it, but it would want to take on a different strategy, such as launching -c n pings -W t seconds apart, then wait until either all of them come back or -w has passed, and count.

I think I'd make a few changes. One idea is to space pings out at a fixed interval, regardless of RTT. The second would be to put serial numbers of some sort in every ping, then add a few fields to detect if you got any responses out of order and if you got any duplicates. Then, end the entire thing on either deadline or you get count responses. The down side to this is you might have multiple pings in flight at the same time. I think that's rarely a problem, but if we wanted to not break backward compatibility you could signal you want this new behavior by specifying some kind of backward compatibility flag, or replicate 'native' and make a new choice with a new name.

Thoughts ?