raintank / worldping-api

Worldping Backend Service
Other
25 stars 18 forks source link

Traceroute from a Collector #19

Open woodsaj opened 8 years ago

woodsaj commented 8 years ago

Issue by mattttt Tuesday Aug 04, 2015 at 20:59 GMT Originally opened as https://github.com/raintank/grafana/issues/396


For discussion...

This has come up a couple times, but clients have requested the ability to traceroute from a collector to an endpoint, so they can see the network path a collector takes.

I envision this functionality existing on the Collector Summary page, and triggering a popup/external window that communicates with the collector, opens a connection, and reports the traceroute back in real time.

woodsaj commented 8 years ago

Comment by Dieterbe Wednesday Aug 05, 2015 at 07:54 GMT


note that routes may be changing at all times. in fact, the reason why a ping check temporarily fails while it should normally succeed may be exactly due to a temporary routing change. so an "after the fact" traceroute could be very misleading. OTOH doing a traceroute everytime state changes from ok to error might be a bit too resource intensive? idea for a paying feature? thoughts @woodsaj @nopzor1200 @ctdk ?

woodsaj commented 8 years ago

Comment by woodsaj Wednesday Aug 05, 2015 at 09:21 GMT


Adding adhoc traceroutes shouldn't be too hard. Once we have it wouldn't be too hard integrate them with alerting, though would obviously need to charge appropriately to recover costs.

woodsaj commented 8 years ago

Comment by nopzor1200 Tuesday Aug 11, 2015 at 15:31 GMT


I think it's interesting to be able to do this "on demand" and focus on that use case initially.

Later we can decide to trigger either automatically (with an alert), or periodically, or whatever we want, for certain clients.

Agreed re: cost recovery, but that's a larger issue we're dealing with atm :)

woodsaj commented 8 years ago

Comment by ctdk Friday Aug 14, 2015 at 00:39 GMT


Seems reasonable to me. Pingdom provides that very function, actually, and I often found it useful as a way to tell what the problem was, be it firewall mishap, some problem between the monitor and the Internap servers, or something going horribly wrong trying to get into Internap.

woodsaj commented 8 years ago

ok. lets start tackling this.

There are 2 things that need to be done to address this. 1) add support for ad-hoc check execution. 2) add support for traceroute.

For 1, as we already have a websocket connection from worldping-api to the probes, we just need to send "adhoc-check" event and wait for a "adhoc-check-result" to be sent back. This would also allow us to add a "test now" button on the endpoint config page to enable users to verify that checks are configured correctly.

For 2, looks like https://github.com/aeden/traceroute is a good start.