open-telemetry / opentelemetry-java

OpenTelemetry Java SDK
https://opentelemetry.io
Apache License 2.0
1.95k stars 805 forks source link

Make `MAX_SCALE` and `MAX_BUCKETS` in `Base2ExponentialHistogramAggregation` configurable via Env vars #5632

Open Dogacel opened 1 year ago

Dogacel commented 1 year ago

Is your feature request related to a problem? Please describe. Prometheus remote write extension requires scale to be between -4 and 8. Thus, default java instrumentor can't export exponential histograms to prometheus.

Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "prometheusremotewrite", "error": "Permanent error: cannot convert exponential to native histogram. Scale must be <= 8 and >= -4, was 20;

Describe the solution you'd like Make MAX_SCALE and MAX_BUCKETS configurable via an env var. Such as

export OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION_MAX_SCALE=7
export OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION_MAX_BUCKETS=100

Describe alternatives you've considered We don't want to deep dive into the SDK to configure defaults. We are using pre-built jars of instrumentators and trying to configure them using env vars.

Additional context This issue is being handled in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/24026 as well. But it just got merged and I still think having customizable scales by default can benefit the user.

jack-berg commented 1 year ago

Are you using the prometheus exporter (i.e. PrometheusHttpServer) or OTLP?

The environment variable you're modeling this suggestion off are defined in the specification here. We're bound by the specification in terms of which environment variables we support.

We could potentially add an environment variable with an *_EXPERIMENTAL_* designation (see OTEL_EXPERIMENTAL_EXPORTER_OTLP_RETRY_ENABLED defined here for example), but I'd want to know that somebody is pushing to have that added to the spec such that we have a route to make it stable. Changing the spec'd environment variables is challenging at the moment because there is a moratorium in place preventing new environment variables while we work out a more full featured file based configuration alternative. File based configuration is a work in progress - can learn about what's been done so far here.

Dogacel commented 1 year ago

Are you using the prometheus exporter (i.e. PrometheusHttpServer) or OTLP?

Java Otel Agent -> Sidecar OTLP collector (aws-otel-collector) -> Prometheus (using PrometheusRemoteWriteExtension)

The environment variable you're modeling this suggestion off are defined in the specification here. We're bound by the specification in terms of which environment variables we support.

We could potentially add an environment variable with an *_EXPERIMENTAL_* designation (see OTEL_EXPERIMENTAL_EXPORTER_OTLP_RETRY_ENABLED defined here for example), but I'd want to know that somebody is pushing to have that added to the spec such that we have a route to make it stable. Changing the spec'd environment variables is challenging at the moment because there is a moratorium in place preventing new environment variables while we work out a more full featured file based configuration alternative. File based configuration is a work in progress - can learn about what's been done so far here.

From what I understood from the docs, the MaxScale parameter can be set from the SDK rather than using environment variables. There isn't any config or env vars to set a default for it.

In that sense, when used with the java agent, there is no way to configure it. And because there is a moratorium, what is the recommended way of updating the configuration parameter?

Maybe the java agent can be kept outside that moratorium because in SDK we can programmatically change stuff but in the agent there is no other way than configuration parameters? And they don't have to be a part of the SDK, can be special to the agent?

jack-berg commented 1 year ago

In that sense, when used with the java agent, there is no way to configure it.

There is a way, but its not particularly easy. You can package up an AutoConfigurationCustomizerProvider in an agent extension, and use the customizer to change the default aggregation of the OTLP exporter to have the desired max scale. Something like the following would do the trick:

import io.opentelemetry.exporter.otlp.metrics.OtlpGrpcMetricExporter;
import io.opentelemetry.sdk.autoconfigure.spi.AutoConfigurationCustomizer;
import io.opentelemetry.sdk.autoconfigure.spi.AutoConfigurationCustomizerProvider;
import io.opentelemetry.sdk.common.CompletableResultCode;
import io.opentelemetry.sdk.metrics.Aggregation;
import io.opentelemetry.sdk.metrics.InstrumentType;
import io.opentelemetry.sdk.metrics.data.AggregationTemporality;
import io.opentelemetry.sdk.metrics.data.MetricData;
import io.opentelemetry.sdk.metrics.export.MetricExporter;
import java.util.Collection;

public class OtlpMaxScale implements AutoConfigurationCustomizerProvider {
  @Override
  public void customize(AutoConfigurationCustomizer autoConfiguration) {
    autoConfiguration.addMetricExporterCustomizer(
        (metricExporter, configProperties) -> {
          if (metricExporter instanceof OtlpGrpcMetricExporter) {
            return new CustomizedMetricExporter(metricExporter);
          }
          return metricExporter;
        });
  }

  private static class CustomizedMetricExporter implements MetricExporter {
    private final MetricExporter delegate;

    private CustomizedMetricExporter(MetricExporter delegate) {
      this.delegate = delegate;
    }

    @Override
    public Aggregation getDefaultAggregation(InstrumentType instrumentType) {
      if (instrumentType == InstrumentType.HISTOGRAM) {
        return Aggregation.base2ExponentialBucketHistogram(160, 7);
      }
      return delegate.getDefaultAggregation(instrumentType);
    }

    @Override
    public AggregationTemporality getAggregationTemporality(InstrumentType instrumentType) {
      return delegate.getAggregationTemporality(instrumentType);
    }

    @Override
    public CompletableResultCode export(Collection<MetricData> metrics) {
      return delegate.export(metrics);
    }

    @Override
    public CompletableResultCode flush() {
      return delegate.flush();
    }

    @Override
    public CompletableResultCode shutdown() {
      return delegate.shutdown();
    }
  }
}

Other examples have come up (see this) of folks asking for exceptions to the moratorium. My inclination is to provide temporary / experimental java specific environment variable support only if there is no other way to solve the configuration problem using autoconfiguration customization.

Dogacel commented 1 year ago

Yep, seems like agent extensions are way to go. Initially I was worried about writing one but it seems like writing one will be the best option any way.

jack-berg commented 1 year ago

You might be interested in #5652, which will make it easier to customize Otlp{Protocol}MetricExporter via AutoConfigurationCustomizer#addMetricExporterCustomizer.