Closed briansgill closed 2 months ago
Uh oh! Let me try with a new ping engine and see if that's any better.
OK, if you're up for it, try one more time and let me know if this one is better. Latest fix building now.
Thanks. Checking it out now via version kt-2024-08-27-10571748813
@i3149 - looks like the new version is working as expected and not recording any false packet loss. Old version was giving pretty static results all the time. This version seems to be recording variable response times on the pings which probably is more reflective of real world results? What do you think of the results?
Nice! This looks much more like real life to me. Behind the scenes we open sourced kentik's own icmp tool. This one works by sending Y packets for X seconds in its own thread and then reporting the result.
Before, we were trying to use https://github.com/prometheus-community/pro-bing which keeps state across time ticks and polling the stats every X seconds. The problems are as you discovered, mostly because state is kept across time intervals so you can get some weird results.
This is a follow-up to issues 732 and 736 - https://github.com/kentik/ktranslate/issues/736
There was still packetloss recorded on the initial polling but the bigger issue in the latest version released was some of the devices did not get polled for a lengthly time period as in below example. The poller was started up at 3pm but it seems this specific host got polled in the beginning and then no more polls on it until like 5:15pm timeframe. So a >2 hour gap. There were other hosts having the same type of behavior as well.