quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.87k stars 2.71k forks source link

Support `@Traceless` for suppressing traces from application URI #44733

Open mcruzdev opened 3 days ago

mcruzdev commented 3 days ago

After this discussion on https://github.com/quarkusio/quarkus/pull/43885 pull request, we decided to remove the annotation and to work only with the quarkus.otel.traces.suppress-application-uris property.

Rever PR: https://github.com/quarkusio/quarkus/pull/44724

To give a better implementation we need to be sure that:

Discussion's resume

cc: @michalvavrik @geoand @brunobat

quarkus-bot[bot] commented 3 days ago

/cc @brunobat (opentelemetry), @radcortez (opentelemetry)

geoand commented 2 days ago

Thanks for starting this discussion.

First of all, I would like to ask @brunobat if there is currently anyway other than the use of DropTargetsSampler to be able to ignore a span. I am asking because if we can for example set some kind of property on a span that would cause it to be effectively ignored, we could implement @Traceless properly.

brunobat commented 2 days ago

Sampler is the right approach, per spec. There are flags in the context that flag if a span is sampled or not but I will have to dig on that. About the @Traceless, @michalvavrik is right, however we don't need to be super fine grained in the 1st approach. What if we limit @Traceless to the class level @Path for now and clarify that this will affect all subpaths? Inheritance is definitely an issue but, again, it can be documented for now.
This has been requested because some user features (globally) must not be tracked. Fine grained sampling has not been requested until now.

geoand commented 2 days ago

Honestly, whether the annotation applies to the class only class or also to the methods, makes almost no difference in the proper implementation.

If we have a way to mark the current span as ignored, then I can take care of this

michalvavrik commented 2 days ago

however we don't need to be super fine grained in the 1st approach.

I read and commented on that PR because I remember the pain it was to make it clear when security annotations are applied and assert that. And even though it is documented, there are still people who use it differently, I fixed such reported "bug" just this month. I am just worried you will get feedback from people that won't get it. Anyway, IMHO the decision is yours.

What if we limit @Traceless to the class level @Path for now and clarify that this will affect all subpaths?

Resource class methods still can share paths with other resources that differ in stuff like HTTP methods etc.

@Traceless
@Path("/number")
public class GetResource {
   @Path("/one") 
   @GET
  String getOne() { return "one"; }

}

// this one will be traceless but not annotated
@Path("/number/one")
public class PutResource {
    @PUT
    public void putOne(.....
}

so please include into the docs that other resources that are not annotated may be affected as well. My opinion is that advantage of annotation is that it can clearly mark what is disabled (like the annotated endpoint), unlike configuration approach, but it will be bit less clear with this @Traceless.

brunobat commented 2 days ago

I understand your concerns, @michalvavrik. The main issue here is to allow users to easily weed out endpoints they don't want to trace without implementing their own sampler.

Let's split the problem in 2 and leave the @Traceless issue for later, if you all don't mind, and focus for now in the config property to have a quick win.

This is simpler and probably what was implemented on that part can be re-used. Does everyone agree on having a PR with just the config property first?

geoand commented 2 days ago

Does everyone agree on having a PR with just the config property first?

Tha's already in, no?

brunobat commented 2 days ago

Was part of the reverted PR: https://github.com/quarkusio/quarkus/pull/43885/files?short_path=122572f#diff-d10dd3953f667e0269ac3ce039afd1dd94a385c3862dcc57148560b4f7bcaea2R136

geoand commented 2 days ago

We only reverted the annotation, the config property is still there

brunobat commented 2 days ago

cool

geoand commented 1 day ago

If we have a way to mark the current span as ignored, then I can take care of this

@brunobat this comment is addressed to you :)

brunobat commented 1 day ago

If we have a way to mark the current span as ignored, then I can take care of this

@brunobat this comment is addressed to you :)

I know. It's on my queue.

geoand commented 1 day ago

👌🏽

brunobat commented 14 hours ago

These are the key points about sampling.

There are 2 types of sampling, head sampling and tail sampling. What we do in Quarkus is head sampling where the sampling behaviour is set upon span creation. Tail sampling is quite complex and happens after the entire trace (multiple spans) are collected and requires buffering inside the span processor (different from exporter). An analysis of the collected data decides if the trace should be sampled or not. The OTel SDK doesn't support this out of the box.

According to the spec, the sampling decision, "the SDK must call the Sampler every time a Span is created during the start span operation.".

Later, the part of the SDK spec about sampling states that "Sampler interface allows users to create custom samplers" to control this process and "a Tracestate that will be associated with the Span through the new SpanContext". It is this Tracestate that will be propagated and be used in sampling decisions for child spans. This is the main reason why the sample state is immutable.

Long story short: Sampling cannot be changed after the span is created.

Some things can happen now:

  1. We keep the path strategy but change something in the Traceless annotation that makes the behaviour clearer.
  2. We add a special attribute that marks the span for sampling, however, this requires us to change all instrumentation code to somehow add that attribute if the method signature has the annotation. This is a bit over the top.
  3. We store everywhere the data about the class/method being called and in the method we use reflection to see if it contains Traceless. Also over the top.

Any suggestions?

geoand commented 13 hours ago

Can't we add some kind of special attribute that when we added means that in the end, the span should be "discarded" (i.e. not sent to or not processed by the exporter)?

brunobat commented 13 hours ago

Can't we add some kind of special attribute that when we added means that in the end, the span should be "discarded" (i.e. not sent to or not processed by the exporter)?

No, unfortunately that attribute must be present in the moment of creation of the span.

geoand commented 13 hours ago

But we can add whatever attribute we want to a span, no?

brunobat commented 13 hours ago

yes, but it will take a lot of work to do that at the moment of creation for all instrumentations. Everywhere. It's brittle and, sincerely, this feature should not take so much effort.

brunobat commented 13 hours ago

In my opinion we should have sampling according to what we already do with resources sampling:

How to specify this in the annotation? I'm open to suggestions.

We probably want to implement the annotation inheritance, though.

geoand commented 13 hours ago

yes, but it will take a lot of work to do that at the moment of creation for all instrumentations. Everywhere. It's brittle and, sincerely, this feature should not take so much effort.

I am actually proposing the opposite. What I asking is if there anything preventing us from doing something like the following (in pseudo code):

getCurrentSpan().setAttribute("quarkus-ignore", true);

then before we send the span to the exporter check:

getCurrentSpan().getAttribute("quarkus-ignore");

If we can do that, then the code to handle @Traceless would then simply add the attribute.

geoand commented 13 hours ago

How to specify this in the annotation? I'm open to suggestions.

This won't work in JAX-RS / Jakarta REST as an annotation on a Resource Method means that the method is not identifiable by a path only.

brunobat commented 13 hours ago

I am actually proposing the opposite. What I asking is if there anything preventing us from doing something like the following (in pseudo code):

getCurrentSpan().setAttribute("quarkus-ignore", true);

That can work but we need to create our processor or our own exporter... In true fairness, we should create our own exporter to protect us from changes in the internal API handling the senders.

This idea can work but it's a lot of work, but at least will provide other benefits.

I like it.

geoand commented 13 hours ago

We already have our own exporters though? Or what am I missing? :)

geoand commented 13 hours ago

Or do you mean SpanProcessor?

geoand commented 13 hours ago

We actually have LateBoundSpanProcessor so couldn't we just check the existence of the special attribute there and not call the delegate when the span should be ignored?

brunobat commented 13 hours ago

We still need the io.opentelemetry.exporter.internal.grpc.GrpcExporter used by the VertxGrpcSpanExporter and the equivalents for HTTP... There are many layers.

On the exporter side I see a place to work here: https://github.com/quarkusio/quarkus/blob/6c87887923bd855bab7824ecb5c0d87933296598/extensions/opentelemetry/runtime/src/main/java/io/quarkus/opentelemetry/runtime/exporter/otlp/tracing/VertxGrpcSpanExporter.java#L20-L19

On the SpanProcessor side we can work here: https://github.com/quarkusio/quarkus/blob/95fea0b5da08c9ec968bfb8e3b42c621ce4412cb/extensions/opentelemetry/runtime/src/main/java/io/quarkus/opentelemetry/runtime/exporter/otlp/tracing/LateBoundSpanProcessor.java#L45

This last one seems better because will generate less work.

geoand commented 13 hours ago

Kk, I'll see if I can put a prototype together next week

brunobat commented 13 hours ago

If we do this, we will be bypassing the sampling system.
This way we might generate gaps in the traces and ppl need to be warned about it. Ex. A child span might show up without a parent. I'm ok with that as long as we have a property to disable the feature, in case ppl are surprised by the behaviour.