grafana / synthetic-monitoring-agent

Synthetic Monitoring Agent
https://grafana.com/docs/grafana-cloud/how-do-i/synthetic-monitoring/
Apache License 2.0
165 stars 24 forks source link

traceroute: hosts in the route-path are not reported to Grafana Cloud #383

Open joelsdc opened 1 year ago

joelsdc commented 1 year ago

We have a working local instance of Grafana v8.5.15 connected to Grafana Cloud.

We have set up a private probe, and we can see the traceroute metrics, but the hosts in the path are not reported:

2022-12-12 19:07:50 level=info target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc msg="Beginning check" type=traceroute timeout_seconds=30
2022-12-12 19:08:03 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=B.B.B.B TTL=1 ElapsedTime=0s LossPercent=0 Sent=5 TracerouteID=5bd56415-a34b-4b27-8966-bd7d0e1e1090
2022-12-12 19:08:03 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts= TTL=2 ElapsedTime=0s LossPercent=100 Sent=5 TracerouteID=5bd56415-a34b-4b27-8966-bd7d0e1e1090
2022-12-12 19:08:03 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts= TTL=3 ElapsedTime=0s LossPercent=100 Sent=5 TracerouteID=5bd56415-a34b-4b27-8966-bd7d0e1e1090
2022-12-12 19:08:03 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts= TTL=4 ElapsedTime=0s LossPercent=100 Sent=5 TracerouteID=5bd56415-a34b-4b27-8966-bd7d0e1e1090
2022-12-12 19:08:03 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts= TTL=5 ElapsedTime=0s LossPercent=100 Sent=5 TracerouteID=5bd56415-a34b-4b27-8966-bd7d0e1e1090
2022-12-12 19:08:03 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts= TTL=6 ElapsedTime=0s LossPercent=100 Sent=5 TracerouteID=5bd56415-a34b-4b27-8966-bd7d0e1e1090
2022-12-12 19:08:03 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=A.A.A.A TTL=7 ElapsedTime=43ms LossPercent=0 Sent=5 TracerouteID=5bd56415-a34b-4b27-8966-bd7d0e1e1090
2022-12-12 19:08:03 level=info target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-root check_name=traceroute source=synthetic-monitoring-agent label_env=dc msg="Check succeeded" duration_seconds=12.739372149

Notice the:

Hosts= TTL=2
Hosts= TTL=3
Hosts= TTL=4
Hosts= TTL=5
Hosts= TTL=6

When doing a traceroute from the OS, we see all the IPs and we can ping each host in the path individually:

root@11cn33:~# traceroute -n A.A.A.A
traceroute to A.A.A.A (A.A.A.A), 30 hops max, 60 byte packets
 1  B.B.B.B  0.222 ms  0.179 ms  0.149 ms
 2  C.C.C.217  0.527 ms  0.500 ms  0.584 ms
 3  C.C.C.229  0.853 ms  0.934 ms  0.798 ms
 4  D.D.179.196  1.000 ms  0.973 ms  0.824 ms
 5  D.D.126.250  1.269 ms * *
 6  72.14.217.46  2.793 ms  3.172 ms  2.878 ms
 7  A.A.A.A  43.223 ms  44.260 ms  44.082 ms
root@11cn33:~#

NOTE: IPs are masked for privacy.

Tested on v0.12.1 and v0.11.0 (in case the "update to mtr package" change from the changelog had anything to do).

I've tried also running setcap cap_net_raw+ep /usr/bin/synthetic-monitoring-agent but it made no difference.

Any suggestions?

I'm not sure if the issue belongs here or in the grafana/mtr repo, if so let me know and I'll move it.

Thanks!

joelsdc commented 1 year ago

I'm trying to debug this.

I've added some logs and hop.Targets that comes from m.Statistic already has the empty hosts, so I think the issue might be in the mtr library?

joelsdc commented 1 year ago

I can't open issues in the Grafana MTR repo so I'll post it here.

I've added debug logs in a bunch of places and managed to pinpoint the issue to these lines in grafana/mtr/blob/master/pkg/icmp/icmp.go:

            if !bytes.Equal(sent[:4], echoPkt[:4]) {
                continue
            }

If I comment out those lines, I see all the hosts from the route-path... but I don't know of the repercusions of doing that.

Example when building with the above changes to grafana/mtr:

2022-12-13 18:35:33 level=info target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc msg="Beginning check" type=traceroute timeout_seconds=30
2022-12-13 18:35:35 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=fw.example.com. TTL=1 ElapsedTime=29ms LossPercent=0 Sent=5 TracerouteID=394e9ce1-057c-4544-9072-3535ef0161a0
2022-12-13 18:35:35 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=107-216-140-1.lightspeed.irvnca.sbcglobal.net. TTL=2 ElapsedTime=12ms LossPercent=0 Sent=5 TracerouteID=394e9ce1-057c-4544-9072-3535ef0161a0
2022-12-13 18:35:35 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=64.148.105.186 TTL=3 ElapsedTime=13ms LossPercent=0 Sent=5 TracerouteID=394e9ce1-057c-4544-9072-3535ef0161a0
2022-12-13 18:35:35 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=12.242.115.21 TTL=4 ElapsedTime=14ms LossPercent=0 Sent=5 TracerouteID=394e9ce1-057c-4544-9072-3535ef0161a0
2022-12-13 18:35:35 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=12.255.10.176 TTL=5 ElapsedTime=16ms LossPercent=40 Sent=5 TracerouteID=394e9ce1-057c-4544-9072-3535ef0161a0
2022-12-13 18:35:35 target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc Level=info Destination=A.A.A.A Hosts=212.94.192.35.bc.googleusercontent.com. TTL=6 ElapsedTime=70ms LossPercent=0 Sent=5 TracerouteID=394e9ce1-057c-4544-9072-3535ef0161a0
2022-12-13 18:35:35 level=info target=A.A.A.A probe=Example-DC-11cn33 region=AMER instance=A.A.A.A job=GCP-VPN-Example-prd check_name=traceroute source=synthetic-monitoring-agent label_env=dc msg="Check succeeded" duration_seconds=2.050764083

Let me know what you think.

joelsdc commented 1 year ago

@mem tagging you here as you are the committer of the changes from my previous comment.

mem commented 1 year ago

@joelsdc thanks, looking

joelsdc commented 1 year ago

Hi @mem, were you able to find anything? Let me know if I can help with testing or debugging.

joelsdc commented 1 year ago

Hi @mem, happy new year!

Do you think you can have a look at this?

joelsdc commented 1 year ago

I'm going to test again with the latest 0.14.0 version and report back

joelsdc commented 1 year ago

Hi @mem, any chance you can have a look at this?

I've just tested on v0.14.2 and the same issue is there.

joelsdc commented 1 year ago

Seems like this commit is the one that introduces the issues for me.

joelsdc commented 1 year ago

Bump? :D

joelsdc commented 1 year ago

Hi @mem, I'm kindly asking again if you can take a look at this, thanks!

joelsdc commented 5 months ago

Hi @mem, any chance you could check my comment please?