grafana / tempo

Grafana Tempo is a high volume, minimal dependency distributed tracing backend.
https://grafana.com/oss/tempo/
GNU Affero General Public License v3.0
3.97k stars 514 forks source link

config tenant override works even when it's not set #2901

Open MarcinGinszt opened 1 year ago

MarcinGinszt commented 1 year ago

Describe the bug Metrics-generator enable_client_server_prefix doesn't actually enable the client server prefix without also setting metrics_generator_processor_service_graphs_enable_client_server_prefix to true.

[Our configuration with](https://github.com/utilitywarehouse/opentelemetry-manifests/blob/main/tempo/base/config.yaml#L29) enable_client_server_prefix and without tenant override doesn't add prefixes to metric labels.

Request to /status/config confirms, that the enable_client_server_prefix is set to true. However, it also reports that metrics_generator_processor_service_graphs_enable_client_server_prefix is set to false, even though it is not defined in our configuration.

Setting in our config metrics_generator_processor_service_graphs_enable_client_server_prefix to true resolves the issue- prefixes are correctly added.

We are using Tempo version 2.2.0, which should have this feature enabled. To Reproduce

  1. Set enable_client_server_prefix in metric generator config to true, while not settting metrics_generator_processor_service_graphs_enable_client_server_prefix.
  2. Query the Prometheus instance for serviceGraph metrics
  3. See, if metric labels are duplicated for client and server. Sadly, they are not.
  4. Add enabled metrics_generator_processor_service_graphs_enable_client_server_prefix to your configuration
  5. Query the Prometheus instance again- now the metric labels are duplicated correctly.

Expected behavior Overrides should not work until they are defined

Environment:

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity. Please apply keepalive label to exempt this Issue.

jcreixell commented 3 weeks ago

I am experiencing this with 2.6.0.

The following config does not properly prefix labels:

metrics_generator:
  processor:
    service_graphs:
      dimensions:
        - foo
      enable_client_server_prefix: true

Adding the following (as suggested in https://github.com/utilitywarehouse/opentelemetry-manifests/pull/15) works around it:

overrides:
  metrics_generator_processor_service_graphs_enable_client_server_prefix: true
kvrhdn commented 6 days ago

I think this code is causing the issue:

https://github.com/grafana/tempo/blob/e33d407c5de4132468307db880d84d7d542710bd/modules/generator/config.go#L130

When we initialize the service graph processor, we copy the setting from the overrides without checking if a value is already set or not.

But... because this field is a bool, it's actually impossible to tell if it has been set or not. It can either be false or true, in which false could mean "I've set this to false" or "I have not set this, it defaulted to false".

So unfortunately this configuration field has actually become useless after we added the override for it. The value will always be replaced by the value in the overrides. Meaning if you don't set overrides for your tenant it will be reset to false.

It's a known shortcoming that our overrides don't allow leaving a field unset. Empty fields will always get the default value for their type, which for boolean values is false. For string and numbers this is less of an issue. In the user-configurable overrides we have addressed this by using *bool instead. So if we ever refactor the runtime overrides entirely we could maybe fix this too.

For clarity I'd suggest removing the configuration option from regular config block and only allow it to be set through overrides.