[feature request] Leave Kafka activities active after Consume is called

IGx89 commented 3 days ago

Component

OpenTelemetry.Instrumentation.ConfluentKafka

Is your feature request related to a problem?

After I call consumer.Consume, Activity.Current is null and so any logs, HTTP requests, SQL updates, etc. I make while processing the message are not tied to the message that triggered them. That prevents me from easily pulling up a trace of a message and seeing if it was successfully processed or not by consumers.

What is the expected behavior?

The current implementation of this component both starts and stops the Activity inside the consumer.Consume method call, preventing any business logic that processes the message (logs, Redis, SQL, HTTP, etc.) from being correlated to that message. That seems to go against the goals of distributed tracing, ending the trace in the middle of the work.

Datadog's tracer accomplishes that by leaving the activity open after Consume is called and closing it next time Consume is called, which in a typical consume loop is immediately after the message is processed. You can look at their code here: https://github.com/DataDog/dd-trace-dotnet/blob/0070285865b391ac1db44682aa24ead9c903dad1/tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/Kafka/KafkaConsumerConsumeIntegration.cs#L66

Which alternative solutions or features have you considered?

I found https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/#consumer-spans which appears to document how things should work here. This section here seems to suggest an alternative solution:

“Process” spans MAY be created in addition to “Receive” spans for pull-based scenarios for operations of processing messages. Such spans could be created by application code, or by abstraction layers built on top of messaging SDKs.

It's not clear how one would do that here though, without effectively re-implementing all the logic of this component (parsing headers, adding tags, etc.). If I'm going to do all that I may as well not use this component at all.

Additional context

No response

github-actions[bot] commented 3 days ago

Tagging component owner(s).

@g7ed6e

g7ed6e commented 2 days ago

Hi @IGx89 This is be design according to the otel messaging specs that's why we introduced the process span support too. Please have a look to the discussions in the original PR and check the below PR that introduce process span support. https://github.com/open-telemetry/opentelemetry-dotnet-contrib/pull/1937

IGx89 commented 1 day ago

Thanks for the fast reply! For anyone else reading this, they have a ConsumeAndProcessMessageAsync IConsumer extension method that you can call in place of consumer.Consume which starts the activity, executes a callback in which you process the message, and then stops the activity.

I'm not sure if that'll work for us since it requires OpenTelemetry-specific modifications of our business logic, but being able to see the proper way to handle this per OpenTelemetry spec is very helpful.

Since almost everyone using this instrumentation will probably want to use this extension method, it might be beneficial to document in your README -- hard for people to find otherwise (I spent half an hour reading through issues and the code before creating this issue and still didn't find it).

open-telemetry / opentelemetry-dotnet-contrib