hypertrace / javaagent

Hypertrace OpenTelemetry Java agent with payload/body and headers data capture.
Apache License 2.0
30 stars 15 forks source link

Add metric config and send metrics #350

Closed shashank11p closed 2 years ago

shashank11p commented 2 years ago

Update agent-config to have metrics configurations, and update javaagent to use them and send metrics using corresponding exporter from config.

Default metric exporter type to OTLP.

ryandens commented 2 years ago

From a code perspective, this is looking great! My one last question is regarding backward compatibility. For users still using the zipkin exporter type, the metric exporter will default to localhost:4317. However, localhost:4317 may not be available/reachable, and could place an unnecessary burden on the application JVM. Should we add a check to not export metrics if the Zipkin endpoint is set?

shashank11p commented 2 years ago

From a code perspective, this is looking great! My one last question is regarding backward compatibility. For users still using the zipkin exporter type, the metric exporter will default to localhost:4317. However, localhost:4317 may not be available/reachable, and could place an unnecessary burden on the application JVM. Should we add a check to not export metrics if the Zipkin endpoint is set?

That is a good point. So we would be checking trace reporter type for disabling metrics. The problem in checking if trace reporter type is ZIPKIN or not is that metric reporter type can be OTLP with metrics endpoint reachable even when traces endpoint is zipkin. So we will disable metrics when they should not have been disabled.

One way I can think of is to add another metricReportingType as none in agent-config and if someone is using ZIPKIN, they can set this config for metricReporterType to none. And then we will disable exporting metrics in agent.

ryandens commented 2 years ago

I think my take would be:

  1. If traceReporting type is zipkin and metric reporting type/endpoint is not set (e.g. default) we shouldn't start reporting metrics today and we should configure the SDK to not report metrics (current behavior)
  2. If traceReporting type is zipkin and metric reporting type is set (but endpoint is not set) then we should default the metric endpoint to localhsot:4317 for otlp metrics or localhost:9464 for prometheus
  3. If traceReporting type is zipkin and metric reporting endpoint is set (but metric type is not set) then we should default the metric type to otlp
shashank11p commented 2 years ago

I think my take would be:

  1. If traceReporting type is zipkin and metric reporting type/endpoint is not set (e.g. default) we shouldn't start reporting metrics today and we should configure the SDK to not report metrics (current behavior)
  2. If traceReporting type is zipkin and metric reporting type is set (but endpoint is not set) then we should default the metric endpoint to localhsot:4317 for otlp metrics or localhost:9464 for prometheus
  3. If traceReporting type is zipkin and metric reporting endpoint is set (but metric type is not set) then we should default the metric type to otlp

This will make it more like hard coded than a generic code. Right now, the agent

  1. reads config file if present
  2. applies default config (for all config if absent, whatever should be the default)
  3. checks system properties and env vars and overrides config made above. in this order. If we want to look at traceReporterType and whether the config for metricReporterType/Endpoint was set or not, we need to mix 2 and 3 with a lot of checks for another config instead of metrics config(trace reporter type/endpoint). Because since default config is applied before, we dont know if it was set or default is applied when we check system properties. That is code in HypertraceConfig and EnvironmentConfig

Also, I think we should still have a config as none for metrics reporter type as it is possible that the collector to which we are exporting is otlp but only has traces pipeline, not the metrics pipeline. So even in this case for otlp endpoint and trace reporter type, we should have an option to not export metrics.

ryandens commented 2 years ago

Good point, I forgot about that quirk. The order in which those tasks are done is definitely problematic and makes a derived configuration hard to work with. This is definitely going to be something we should rework as the agent inevitably builds in complexity, but I'm not sure if tackling that now makes sense. I totally agree we should have a configuration option to set metrics exporter to none.

@davexroth any thoughts here? It seems like our choices are a non-trivial refactor to better align with OTEL's configuration or simply defaulting metrics to oltp and localhost:4317 and encouraging users to override as appropriate

davexroth commented 2 years ago

I think my take would be:

  1. If traceReporting type is zipkin and metric reporting type/endpoint is not set (e.g. default) we shouldn't start reporting metrics today and we should configure the SDK to not report metrics (current behavior)
  2. If traceReporting type is zipkin and metric reporting type is set (but endpoint is not set) then we should default the metric endpoint to localhsot:4317 for otlp metrics or localhost:9464 for prometheus
  3. If traceReporting type is zipkin and metric reporting endpoint is set (but metric type is not set) then we should default the metric type to otlp

I think this is close. For 2. if a user sets metrics reporting type is set, but endpoint isn't, that's an error case, and we should just disable metrics.

ryandens commented 2 years ago

Totally agreed, but Shashank points out some problems to implementing that in our current context (as a result of how config is applied). Reworking this is possible, but complex and shouldn't be done lightly without also thinking about a bigger picture perspective of how we want config to work long term IMO.

davexroth commented 2 years ago

Totally agreed, but Shashank points out some problems to implementing that in our current context (as a result of how config is applied). Reworking this is possible, but complex and shouldn't be done lightly without also thinking about a bigger picture perspective of how we want config to work long term IMO.

If it's too big a change, I'm ok with just saying that we turn off metrics when zipkin trace exporter is selected - it's not something we ever recommend to use these days.

shashank11p commented 2 years ago

If it's too big a change, I'm ok with just saying that we turn off metrics when zipkin trace exporter is selected - it's not something we ever recommend to use these days.

Okay, so if zipkin trace exporter is selected we will not send metrics. And I think we should also add a none reporter type in metric to disable it if the collector to which data is being sent only has traces pipeline but not metrics pipeline.

So we will not send metrics when ZIPKIN trace reporter is selected or when metricReportingType is set to none. Is this good?

ryandens commented 2 years ago

So we will not send metrics when ZIPKIN trace reporter is selected or when metricReportingType is set to none. Is this good?

This works for me - though FWIW I think we don't need to add a "none" type to the metric exporter. We can simply not set a default and treat the METRIC_REPORTER_TYPE_UNSPECIFIED value as "none". If possible, we should add some more test cases to cover the corner cases discussed in this PR to help prevent regressions

ryandens commented 2 years ago

I was struggling to reason with what was working/what wasn't, so I wound up writing a few test cases to explore the zipkin trace exporter edge cases:

zipkinConfig.yaml

service_name: service
reporting:
  trace_reporter_type: ZIPKIN
  endpoint: http://example.com:9411/api/v2/spans
@Test
  @SetEnvironmentVariable(key = "HT_REPORTING_ENDPOINT", value = "http://oltp.hypertrace.org:4317")
  public void complexConfig() throws IOException {
    // GIVEN a config file with a non-default reporting endpoint and an env-var with a different non-default reporting endpoint
    URL resource = getClass().getClassLoader().getResource("config.yaml");
    // WHEN we load the config
    AgentConfig agentConfig = HypertraceConfig.load(resource.getPath());
    // VERIFY the trace and metric endpoints are the both the value of the env var
    String expectedEndpoint = "http://oltp.hypertrace.org:4317";
    assertEquals(expectedEndpoint, agentConfig.getReporting().getEndpoint().getValue());
    assertEquals(expectedEndpoint, agentConfig.getReporting().getMetricEndpoint().getValue());
  }

  @Test
  public void zipkinExporter() throws IOException {
    // GIVEN a config file with a non-default zipkin reporting endpoint
    URL resource = getClass().getClassLoader().getResource("zipkinConfig.yaml");
    // WHEN we load the config
    AgentConfig agentConfig = HypertraceConfig.load(resource.getPath());
    // VERIFY the trace reporting endpoint is the zipkin endpoint
    assertEquals("http://example.com:9411/api/v2/spans", agentConfig.getReporting().getEndpoint().getValue());
    // VERIFY the metric endpoint is not set
    assertEquals( "", agentConfig.getReporting().getMetricEndpoint().getValue());
  }

  @Test
  @SetEnvironmentVariable(key = "HT_REPORTING_METRIC_REPORTER_TYPE", value = "METRIC_REPORTER_TYPE_OTLP")
  public void zipkinExporterOtlpMetricsType() throws IOException {
    // GIVEN a config file with a non-default zipkin reporting endpoint and the metric type is set in an env var
    URL resource = getClass().getClassLoader().getResource("zipkinConfig.yaml");
    // WHEN we load the config
    AgentConfig agentConfig = HypertraceConfig.load(resource.getPath());
    // VERIFY the trace reporting endpoint is the zipkin endpoint
    assertEquals("http://example.com:9411/api/v2/spans", agentConfig.getReporting().getEndpoint().getValue());
    // VERIFY the metric reporting type is otlp
    assertEquals(MetricReporterType.METRIC_REPORTER_TYPE_OTLP, agentConfig.getReporting().getMetricReporterType());
    // VERIFY the metric endpoint is not set
    assertEquals( "", agentConfig.getReporting().getMetricEndpoint().getValue());
  }

  @Test
  @SetEnvironmentVariable(key = "HT_REPORTING_METRIC_ENDPOINT", value = "http://oltp.hypertrace.org:4317")
  public void zipkinTraceExporterOtlpMetricsEndpoint() throws IOException {
    URL resource = getClass().getClassLoader().getResource("zipkinConfig.yaml");
    AgentConfig agentConfig = HypertraceConfig.load(resource.getPath());
    String expectedEndpoint = "http://example.com:9411/api/v2/spans";
    assertEquals(expectedEndpoint, agentConfig.getReporting().getEndpoint().getValue());
    assertEquals( "http://oltp.hypertrace.org:4317", agentConfig.getReporting().getMetricEndpoint().getValue());
    assertEquals(MetricReporterType.METRIC_REPORTER_TYPE_OTLP, agentConfig.getReporting().getMetricReporterType());
  }

Based on the decisions in this PR, i don't know how useful they are but figured they were worth preserving in some fashion

shashank11p commented 2 years ago

As discussed offline, we will disable metric reporting if TRACE_REPORTER_TYPE is ZIPKIN, or if METRIC_REPORTER_TYPE is set to none.