Open jakob-o opened 4 years ago
If I understood correctly a (maybe temporary) solution might be to create a non-recording / invalid span in the HTTP instrumentation, which due to the
ParentOrElse
-Sampler, would lead to ignoring child spans as well, if the request matches a URL pattern. Any hint to where this would be architecturally appropriately implemented is highly appreciated. Maybe in the HttpServerTracer
That seems like a reasonable approach :)
Been trying auto instrumentation in a container lately, and was slightly annoyed myself with tracing of health check. Then a customer trying it out independently had the same feedback - it's exactly the kind of input we're hoping for in trials :) So I may actually mark this required for GA, the UX is impacted a lot with having no control of tracing by URL pattern.
I have started looking into this.
Is there a proposed design for this? In OpenTracing this was an instrumentation feature. The instrumentation check if the URL matches exclude pattern if yes then the span wasn't created. However if the excluded URL uses another instrumentation (or makes downstream call) that would create a span. The question is whether we want to exclude just specific URLs or the whole trace starting at that URL.
Yes, we have a proposal right in the task's description:
If I understood correctly a (maybe temporary) solution might be to create a non-recording / invalid span in the HTTP instrumentation, which due to the ParentOrElse-Sampler, would lead to ignoring child spans as well, if the request matches a URL pattern. Any hint to where this would be architecturally appropriately implemented is highly appreciated. Maybe in the HttpServerTracer?
This should lead to ignoring the whole subtree starting from that SERVER
span.
+1 I think that is the right way to go https://github.com/open-telemetry/opentelemetry-specification/issues/173#issuecomment-698190021. It would also make sense to have a consolidated config property for this.
@iNikem what is the API to create non-recording span? In OT there was sampling.priority=bool
tag that could be applied on the span builder https://github.com/opentracing/specification/blob/master/semantic_conventions.yaml#L26
I think the right way is to use one of the factory methods on io.opentelemetry.trace.DefaultSpan
.
Yeah, I wanted to avoid having two paths of span creation.
Sorry if double-spam, I thought I had already posted this. How about we have a special Sampler
itself configured that delegates to the default, except for when the path matches the allow list, then it's ParentOnly
? It means we need to refactor to make sure our tracers set attributes on Span.Builder
instead of Span
- a little annoying but we should have been doing that already so maybe good motivation for it.
Note that DefaultSpan
might get renamed to something that would be naming-wise not a good fit with what we want to do here https://github.com/open-telemetry/opentelemetry-specification/pull/994#pullrequestreview-495461768
refactor to make sure our tracers set attributes on Span.Builder instead of Span
Yes, we want to do that eventually.
Is there a way now to exclude health check traces? I checked processors but could not find any solution too.
No, this functionality is not yet implemented.
bump. Is there any workaround here in the meantime? Sampling of health checks is not ideal.
bump. Is there any workaround here in the meantime? Sampling of health checks is not ideal.
The only known workaround is to write custom sampler.
But I have plans to address this issue during the next month or so.
The corresponding Sampler has been implemented in the contrib repo. It can be added to your deployment using extension mechanism. There is no immediate plans to add that sampler into this distribution, as this requires changes in Otel Specification and that requires some effort.
@iNikem what is the proposed way to configure these classes from environment? Is there any mechanism to bind jvm parameter to fields?
@iNikem what is the proposed way to configure these classes from environment?
There is no such way. To use Sampler from the contrib repo you have to add it to your deployment via extension.
@iNikem what is the proposed way to configure these classes from environment?
There is no such way. To use Sampler from the contrib repo you have to add it to your deployment via extension.
👍
@cemo The current configuration is limited to key/value pairs, which doesn't model the sampler rules well. In the future, we're hoping to have a richer configuration file: https://github.com/open-telemetry/opentelemetry-specification/issues/1773. Then we could add experimental configuration support for this. And eventually we're hoping for these sampler rules (or something like them) to be spec'd and supported cross-language.
@iNikem @trask Could provide an example how to use the RuleBasedRoutingSampler exactly? What i did so far is:
java -javaagent:path/to/opentelemetry-javaagent.jar \
-Dotel.javaagent.extensions=opentelemetry-contrib-samplers-1.8.0-SNAPSHOT.jar
-jar myapp.jar
But how do i pass the rules to the agent? In the DemoSampler https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/examples/extension/src/main/java/com/example/javaagent/DemoSampler.java it's more clear as the rule/condition is directly inside the class
hi @gtuk! opentelemetry-contrib-samplers
isn't published as an opentelemetry javaagent extension, so you'll need to build your own extension that consumes (and programmatically configures) the rules
@trask The extensions are working out pretty neat 🎉 . Thanks for that.
One question though, Sampler's shouldSample
only works with the following data to set the sampling decision.
hey @RashmiRam! check out #3907, and there was recent discussion also about this: https://github.com/open-telemetry/opentelemetry-java/issues/3963#issuecomment-988460277
Thanks @trask. That was really helpful. Yes. https://github.com/open-telemetry/opentelemetry-java/issues/3963#issuecomment-988460277 is also my exact use case. Wanted to achieve the same thing to push all errors irrespective of the sampling decision. Do you think it is a valid ask to have configuration in SDK to push all errors irrespective of sampling?
Do you think it is a valid ask to have configuration in SDK to push all errors irrespective of sampling?
the problem is that when the sampler decides not to sample a span (which happens at the start of a span), the SDK doesn't capture any telemetry on that span. this makes unsampled spans very efficient, but it makes impossible to change your mind at the end of an unsampled span.
Maybe there's a better place to ask for help, but I'm trying to resolve this problem with the extension workaround. When attempting to run the extension example here I get an error because I'm trying to build for Java8 as opposed to Java11 - is that a strict requirement for using extensions?
I tried to downgrade some gradle dependencies, but I'm not that familiar with Gradle myself (mostly use Maven). If there's a better place to ask questions or get help, please let me know!
hi @welshm! try building with Java 11, it should still produce Java 8 compatible code
hello, is there any news about this feature ?
Any update on this ?
Can anyone provide an example code in github link on how to exlude urls(exlude health check urls)
Hey @Montyroi , This feature has not been implemented in the "core" javaagent yet. However, you can write a sampler yourself and package it in an extension. You can use the rule based sampler from the contrib repo that Nikita mentioned a couple posts earlier:
The corresponding Sampler has been implemented in the contrib repo. It can be added to your deployment using extension mechanism. There is no immediate plans to add that sampler into this distribution, as this requires changes in Otel Specification and that requires some effort.
Or implement one from scratch, e.g. https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/examples/extension/src/main/java/com/example/javaagent/DemoSampler.java
Hey @mateuszrzeszutek @trask @iNikem I am trying to exclude health check endpoint, even though spankKind Is INTERNAL and parentContext is containing /actuator/health, I am able to see the health check traces. How to exclude/drop the health check traces?
if (spanKind == SpanKind.INTERNAL && parentContext.toString().contains("/actuator/health")) {
return SamplingResult.create(SamplingDecision.DROP);
}
Command used to run it
java -javaagent:/Users/Downloads/jaegar/build/otel/opentelemetry-javaagent.jar "-Dotel.resource.attributes=service.name=test3" "-Dotel.traces.exporter=jaeger" "-Dotel.javaagent.extensions=/Users/Documents/opentelemetry-java-instrumentation/examples/extension/build/libs/opentelemetry-java-instrumentation-extension-demo-1.0-all.jar" "-Dotel.exporter.otlp.traces.endpoint=http://localhost:14250" -jar /Users/Downloads/jaegarr/build/libs/jaegarr1-0.0.1-SNAPSHOT.jar
Hey @Montyroi , This feature has not been implemented in the "core" javaagent yet. However, you can write a sampler yourself and package it in an extension. You can use the rule based sampler from the contrib repo that Nikita mentioned a couple posts earlier:
The corresponding Sampler has been implemented in the contrib repo. It can be added to your deployment using extension mechanism. There is no immediate plans to add that sampler into this distribution, as this requires changes in Otel Specification and that requires some effort.
Or implement one from scratch, e.g. https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/examples/extension/src/main/java/com/example/javaagent/DemoSampler.java
@mateuszrzeszutek how do we configure the list of rules for rule based sampling? could you provide a example.
This is has come up a number of times for our customers as well, so I've updated our otel java agent example to package an extension which uses the rule based to drop requests to spring boot actuator endpoints. Maybe seeing concrete sample code will clear up any confusion.
Thank you so much for this, @jack-berg. I got my own version of it running now. For anyone else using Maven, seems you need to use the following in order to create an extension fat-jar with all the runtime dependecies:
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
</plugin>
I was unable to get it to run sucessfully when trying to add to the classpath normally.
+1 any updates on the topic ? I guess it is impossible when using javaagent.jar?
hi @hadson172! @jack-berg's example above is using the javaagent.jar
it's definitely not as simple as we would like it to be. at some point we would like to add configuration-based health check exclusions
+1 I would love to use the agent inside aws. Since the pricing is trace amount based, it is crucial to exclude actuator Endpoints.
I have no choice but to use it until the exclusion tracing function is released.
for anyone who would like to work on adding this functionality to the base otel javaagent, here's a very high-level outline:
use a yaml configuration file to dynamically configure the RuleBasedRoutingSampler during startup of the javaagent
use a system property, e.g. -Dotel.sampling.rules.config=...
, that users can use to specify the location of the yaml configuration file (later we will have a single yaml configuration file and we can consolidate, see recently merged configuration otep)
check out the jmx-metrics and metric view yaml configuration for some inspiration
Based on @jack-berg examples, and the possibility to embed extension directly into the agent, I've created this simple project that provides both the java agent and the docker auto instrumentation that will ignore /health
and /metrics
calls.
We have similar requirements and wrote an extension that allows you to dynamically configure what sampler to use and add filtering for span creation. Configuration can be done by simply changing the configuration file while the application is running, or also by using the optional REST service for configuring multiple agents. The service also exposes Prometheus compatible metrics. You might find it useful. See https://github.com/domstolene/da-otel-agent for details.
I did a work-around for this having a look at how the OpenTelemetryAutoConfiguration class builds the SdkTracerProvider where the sampler is being set for the bean in Spring.
I duplicated the code in my bean configuration after I have seen the original bean is conditional and created a custom a sampler, which I used to drop any health check related traces.
The caveat is that the actual health check traces are coming with no names to the sampler and denying all spans with no names may drop some unintended traces... For now we are just only doing a PoC for a product, so take this as it is :)
Here is the sample for this:
@Bean
SdkTracerProvider otelSdkTracerProvider(Environment environment, ObjectProvider<SpanProcessor> spanProcessors,
Sampler sampler, ObjectProvider<SdkTracerProviderBuilderCustomizer> customizers) {
String applicationName = environment.getProperty("spring.application.name", "application");
SdkTracerProviderBuilder builder = SdkTracerProvider.builder()
.setSampler(new HealthCheckExclusionSampler())
.setResource(Resource.create(Attributes.of(ResourceAttributes.SERVICE_NAME, applicationName)));
spanProcessors.orderedStream().forEach(builder::addSpanProcessor);
customizers.orderedStream().forEach((customizer) -> customizer.customize(builder));
return builder.build();
}
/**
* A sampler implementation that excludes certain health check spans from being sampled.
* Note: I ignore all spans without a name, reason being that the actuator health checks apparently have no names.
*/
static class HealthCheckExclusionSampler implements Sampler {
@Override
public SamplingResult shouldSample(Context parentContext, String traceId, String name, SpanKind spanKind, Attributes attributes, List<LinkData> parentLinks) {
if (name.contains("/actuator/health") || name.contains("grpc.health.v1.Health/Check") || name.contains("<unspecified span name>")) {
return SamplingResult.drop();
}
return SamplingResult.recordAndSample();
}
@Override
public String getDescription() {
return "HealthCheckExclusionSampler";
}
}
}
We have some services using Micronaut and it has a simple way of excluding HTTP routes https://micronaut-projects.github.io/micronaut-tracing/latest/guide/#http but we have other tech stacks too to deal with.
Also noticed those links in the contr project were broken. Think this is the new place.
If you're using a central collector to send traces to your backends, you can filter out those spans - https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/filterprocessor/README.md
linking related: https://github.com/open-telemetry/oteps/pull/240
my architecture - app -> otel agent (auto instrumentation) -> otel collector -> observability backend for now, i wanted to filter out swagger and health check calls (caused due to k8s readiness probe)
Is there any update on this?
I understand the concerns raised by @iNikem about the specification being hard to change but this feature has been requested by numerous people for quite some time now.
I think this is a very important and much-needed feature for a lot of users of OpenTelemetry.
@trask Has anyone tried to contribute to the project with this feature? I would like to take a go at implementing it for the Otel Java agent.
hi @nedcerneckis!
I understand the concerns raised by @iNikem about the specification being hard to change but this feature has been requested by numerous people for quite some time now.
I don't believe we're blocked by specification work, since we can add this initially as an opt-in experimental feature (bypassing the need for a specification)
Has anyone tried to contribute to the project with this feature? I would like to take a go at implementing it for the Otel Java agent.
not yet, that would be great, check out https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1060#issuecomment-1494948191 for very high-level sketch that could fit well with existing feature set
For Lumigo's distribution I implemented a SamplerCustomizer
for use with AutoConfigure, see here.
My approach was to allow urls for client and server to be filter, or separate env vars to filter urls for only client or server spans. The urls are defined as an array of regex, such as [".*/health.*", ".*/actuator.*"]
Thank you very much @trask! Much appreciated for the info.
I'm assuming this needs to be a little more advanced than just excluding certain HTTP endpoints and include other types of rules inside this YAML config file?
Is your feature request related to a problem? Please describe. As already mentioned here https://github.com/open-telemetry/opentelemetry-specification/issues/173 I'd like to be able to exclude or sample a list of URLs / URL-Patterns from instrumentation. In my case particularly to avoid generating many events from health- and liveness-checks.
Describe the solution you'd like I opened the issue https://github.com/open-telemetry/opentelemetry-java/issues/1552 to discuss if / how tracing might be disabled from instrumentation. To my knowledge there currently is no API / SDK method to disable tracing centrally on the context. If I understood correctly a (maybe temporary) solution might be to create a non-recording / invalid span in the HTTP instrumentation, which due to the
ParentOrElse
-Sampler, would lead to ignoring child spans as well, if the request matches a URL pattern. Any hint to where this would be architecturally appropriately implemented is highly appreciated. Maybe in the HttpServerTracer?Describe alternatives you've considered We already attempted to use the
otel.trace.classes.exclude
but only succeeded in completely disabling WebMvc instrumentation./CC @gabrielthunig @spaletta