grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
3.99k stars 516 forks source link

Seems the dashboards to monitor tempo are broken #3992

Open chenlujjj opened 2 months ago

chenlujjj commented 2 months ago

Describe the bug

Hi team, I deployed tempo-distributed in k8s cluster and tried to monitor it with the dashboards here. But I found that the dashboards are broken, for example:

Expected behavior

The dashboards should be normal and show metrics well

Environment:

Additional Context

chenlujjj commented 2 months ago

Found a PR: https://github.com/grafana/tempo/pull/3917 may be related to the tempo_receiver_accepted_spans metric, I'll try to upgrade my tempo deploy

javiermolinar commented 2 months ago

Hi, the tempo_receiver_accepted_spans will be available in the 2.6.0 release and then it will need to be updated in the helm chart. For the tempo_build_info metric, we use the same dashboards so it makes sense for us, maybe @zalegrala knows more.

chenlujjj commented 2 months ago

Thanks @javiermolinar

Does the tempo_build_info metric in your stack have cluster label? Below is what I get from one of the temp distributor instances:

image
javiermolinar commented 2 months ago

Here is where is populated: https://github.com/grafana/tempo/blob/fbf249a41fdc9ee9ddc8168c4a4f92e426f92bb0/cmd/tempo/build/build.go#L20

The cluster label is probably added in the K8s relabel configuration. That way all our metrics include the cluster info

chenlujjj commented 2 months ago

Got it!

zalegrala commented 2 months ago

That's right. Add a cluster and namespace label in the scrape configs. This should mean the queries in the dashboard work as intended.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.