influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.44k stars 5.54k forks source link

A plugin to monitor websites the way traceroute does. #12921

Open abhi-jha opened 1 year ago

abhi-jha commented 1 year ago

Use Case

Curently we can do this via the exec plugin as described below.

[[inputs.exec]]
      interval = "60s"
      commands=["sudo mtr -C -n google.com"]
      timeout = "40s"
      data_format = "csv"
      csv_skip_rows = 1
      csv_column_names=["", "", "status", "dest", "hop", "ip", "loss", "snt", "", "", "avg", "best", "worst", "stdev"]
      name_override = "mtr"
      csv_tag_columns = ["dest", "hop", "ip"]

But this requires telegraf having sudo user access so that it can execute mtr command.

The idea is to get this functionality in a native plugin.

Example new plugin :

[[inputs.trace]]
      interval = "60s"
      sites=["google.com", "github.com", "reddit.com"]
      timeout = "40s" # timeout per site can be done using this plugin multiple times for each site requiring a different timeout.
      column_names=["", "", "status", "dest", "hop", "ip", "loss", "snt", "", "", "avg", "best", "worst", "stdev"]

Expected behavior

Not sure but it could look like how mtr provides output:

trace,dest=google.com,hop=7,host=hostname.local,ip=2a00:1450:8106::1 avg=19.83,best=14.22,worst=37.31,stdev=6.73,status="OK",loss=10,snt=10i 1679419996000000000

It maybe possible to include how many hops it is taking as well, something that dig does when doing recurisve search to root servers.

Actual behavior

I used mtr within exec for now but it required that I make the current user which ran telegraf a sudoer.

Making it a native plugin that is only dependent on libraries eliminates the need to have it within sudoer

Additional info

mtr,dest=facebook.com,hop=10,host=host.local,ip=2a03:2880:f173:81:face:b00c:0:25de status="OK",loss=0,snt=10i,avg=15.05,best=13.42,worst=17.87,stdev=1.34 1679419996000000000
mtr,dest=google.com,hop=1,host=host.local,ip=2a02:a210:abd:2600:aef8:ccff:feeb:c984 avg=4.86,best=4.21,worst=5.51,stdev=0.92,status="OK",loss=80,snt=10i 1679419996000000000
mtr,dest=reddit.com,hop=2,host=host.local,ip=??? stdev=0,status="OK",loss=100,snt=10i,avg=0,best=0,worst=0 1679420055000000000
powersj commented 1 year ago

A plugin to monitor websites' response time, health etc. Something close to what traceroute does.

There are like 3 different requests in that title :) What are you actually trying to do or what is your end goal metric? And have you looked at the ping plugin?

abhi-jha commented 1 year ago

Oh, I was looking for something that can do all of them or most of them in a singular plugin. Basically a website's health/usability. @Hipska can clarify.

Hipska commented 1 year ago

On Slack we only discussed for a plugin that can do trace routes natively instead of via the exec command.

Response time and health can already be done via existing plugins.

abhi-jha commented 1 year ago

Yeah, lets reduce the scope for this. I am fine with doing just the traceroute thing.

abhi-jha commented 1 year ago

I need a little help in deciding what the output should look like before I got implementing. Any comments, guidlines is very much apprecaited.

powersj commented 1 year ago

But this requires telegraf having sudo user access so that it can execute mtr command. The idea is to get this functionality in a native plugin.

How is adding a plugin going to resolve the need to use sudo?

There is a solution to run mtr already with exec and parsing the output. We have had a request for mtr before in https://github.com/influxdata/telegraf/issues/2509 and it seemed to come back to use exec, as you are doing now.

I am pushing back on this, because I am not sure I see the value in a new plugin, where we will have different users wanting different options, when they can pretty easily run with exec today and take the fields and format they want.

abhi-jha commented 1 year ago

I think traceroute being supported natively eliminates the need to run it via exec. Getting the same information via libraries shoudln't require messing with any privileges.

abhi-jha commented 1 year ago

@powersj Let me know if we need to discuss this more. I am looking around for some libraries that can provide the traceroute metrics. Do let me know if you have some ideas to pick a particular one.

I can start with a PR.

powersj commented 1 year ago

Let me know if we need to discuss this more. I am looking around for some libraries that can provide the traceroute metrics

I would like to see what you find, especially if it does not require sudo.

You can see https://github.com/traviscross/mtr/issues/204#issuecomment-723961118 about why mtr requires sudo and the comments about setting setuid or other privliges can remove the need for sudo. Again, a user could use this method, today.

Hipska commented 1 year ago

For the ping input we currently use pro-bing, and it seems you can already set an TTL with that one, so it should be possible to implement a traceroute with it.

abhi-jha commented 1 year ago

Should I go ahead and give pro-bing a shot? I want to see what the exact would look like.

Hipska commented 1 year ago

Sure, let us know if it doesn’t work out, we can look for alternatives or solutions.

abhi-jha commented 1 year ago

Is there a suggested list of metrics that comes to mind as ouptut?

count of hops, time taken(min, max, avg, sd), result_code,packets_sent, packets_received, ttl

Is this the right direction?

Hipska commented 1 year ago

If it could be like you showed as example in “Additional info”, that would be great!

Just be sure to emit a metric for every hop.

srebhan commented 9 months ago

Note to myself: https://github.com/wisdomatom/go-mtr

abhi-jha commented 9 months ago

Hello. Looks like I missed this for some time. Is there already such a plugin in? Or should I continue ahead?

Hipska commented 9 months ago

There is no mtr or trace route related plugin yet, You are free to go ahead 😉

srebhan commented 9 months ago

@abhi-jha a PR for adding such a plugin would be appreciated!

abhi-jha commented 3 months ago

Hello. I am trying to get pro-bing library to work as a small POC.

https://influxcommunity.slack.com/archives/CH99HUH8V/p1716680252073199

The problem is that TTLs don't seem to be working very with the library. Do you have any insights?