Closed NaveenRamu closed 1 month ago
Please provide a minimal sample application that reproduces your issue along with any necessary instructions.
Thanks @laurit for the quick reply. Please find the sample application below to reproduce the issue. Steps to reproduce:
JVM args: -Xms512m -Xmx2048m -XX:MaxDirectMemorySize=1g -XX:+HeapDumpOnOutOfMemoryError -Djdk.tls.maxCertificateChainLength=20 -XX:+UseParallelGC -javaagent:D:/opentelemetry-javaagent.jar -Dotel.service.name=receiver -Dotel.javaagent.debug=true -Dotel.traces.exporter=none -Dotel.resource.attributes=service.name=receiver,environment=test,application=receiver -Dotel.exporter.otlp.traces.protocol=grpc -Dotel.metrics.exporter=none -Dotel.logs.exporter=none -Dotel.instrumentation.methods.include=com.datastreaming.handler.AbstractHandler[handle];com.datastreaming.handler.GlobalHandler[handleRequest];com.datastreaming.subscriber.JsonSubscriber[process,sendMessagePulsar,requestComplete];com.datastreaming.handler.AbstractSubscriber[onSubscribe,onNext,onError,onComplete]
Git link for sample code: https://github.com/NaveenRamu/datastreaming.git
@NaveenRamu I tried your application with current development version of agent. Set traces exporter to console
, started your app, and made a request to http://localhost:5050/. The output is
[otel.javaagent 2024-09-24 17:30:03:167 +0300] [ratpack-compute-1-2] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractSubscriber.onSubscribe' : 7157d620876d77ff1f1e551e358250d5 09e3bd5ad1e2344c INTERNAL [tracer: io.opentelemetry.methods:2.9.0-alpha-SNAPSHOT] AttributesMap{data={code.function=onSubscribe, thread.id=18, code.namespace=com.datastreaming.handler.AbstractSubscriber, thread.name=ratpack-compute-1-2}, capacity=128, totalAddedValues=4}
[otel.javaagent 2024-09-24 17:30:03:168 +0300] [ratpack-compute-1-2] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'JsonSubscriber.requestComplete' : 7157d620876d77ff1f1e551e358250d5 a45ffb37171c565a INTERNAL [tracer: io.opentelemetry.methods:2.9.0-alpha-SNAPSHOT] AttributesMap{data={code.function=requestComplete, thread.id=18, code.namespace=com.datastreaming.subscriber.JsonSubscriber, thread.name=ratpack-compute-1-2}, capacity=128, totalAddedValues=4}
[otel.javaagent 2024-09-24 17:30:03:170 +0300] [ratpack-compute-1-2] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractSubscriber.onComplete' : 7157d620876d77ff1f1e551e358250d5 fb57540d0729e99a INTERNAL [tracer: io.opentelemetry.methods:2.9.0-alpha-SNAPSHOT] AttributesMap{data={code.function=onComplete, thread.id=18, code.namespace=com.datastreaming.handler.AbstractSubscriber, thread.name=ratpack-compute-1-2}, capacity=128, totalAddedValues=4}
[otel.javaagent 2024-09-24 17:30:03:171 +0300] [ratpack-compute-1-2] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'GlobalHandler.handleRequest' : 7157d620876d77ff1f1e551e358250d5 8dabffe62bda0ad5 INTERNAL [tracer: io.opentelemetry.methods:2.9.0-alpha-SNAPSHOT] AttributesMap{data={code.function=handleRequest, thread.id=18, code.namespace=com.datastreaming.handler.GlobalHandler, thread.name=ratpack-compute-1-2}, capacity=128, totalAddedValues=4}
[otel.javaagent 2024-09-24 17:30:03:171 +0300] [ratpack-compute-1-2] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractHandler.handle' : 7157d620876d77ff1f1e551e358250d5 feac27808e65e1eb INTERNAL [tracer: io.opentelemetry.methods:2.9.0-alpha-SNAPSHOT] AttributesMap{data={code.function=handle, thread.id=18, code.namespace=com.datastreaming.handler.AbstractHandler, thread.name=ratpack-compute-1-2}, capacity=128, totalAddedValues=4}
[otel.javaagent 2024-09-24 17:30:03:173 +0300] [ratpack-compute-1-2] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'GET' : 7157d620876d77ff1f1e551e358250d5 dbf72c24cb798b40 SERVER [tracer: io.opentelemetry.netty-4.1:2.9.0-alpha-SNAPSHOT] AttributesMap{data={http.request.method=GET, server.port=5050, server.address=localhost, client.address=0:0:0:0:0:0:0:1, thread.id=18, network.peer.address=0:0:0:0:0:0:0:1, url.path=/, http.response.status_code=200, network.protocol.version=1.1, network.peer.port=51356, user_agent.original=curl/8.7.1, url.scheme=http, thread.name=ratpack-compute-1-2}, capacity=128, totalAddedValues=13}
As far as I can tell there is only one trace (7157d620876d77ff1f1e551e358250d5
).
@laurit The single-chunk messages are processed at once, and there is no issue with small payloads. Please try using a larger payload (more than 8KB) or the Postman collection that I’ve attached.
You can find the Postman collection here: https://github.com/NaveenRamu/datastreaming.git.
Please find the below traces: [otel.javaagent 2024-09-25 08:19:31:081 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'GlobalHandler.handleRequest' : 4bf92f3577b34da6a3ce929d0e0e4736 b36bf650a4621731 INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=handleRequest, code.namespace=com.datastreaming.handler.GlobalHandler, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:081 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractHandler.handle' : 4bf92f3577b34da6a3ce929d0e0e4736 f4c9ff91169cc9f6 INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=handle, code.namespace=com.datastreaming.handler.AbstractHandler, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:082 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'JsonSubscriber.process' : 4bf92f3577b34da6a3ce929d0e0e4736 1dd8e5051061d9d4 INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=process, code.namespace=com.datastreaming.subscriber.JsonSubscriber, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:082 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractSubscriber.onNext' : 4bf92f3577b34da6a3ce929d0e0e4736 0dd0c95f7e5e3694 INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=onNext, code.namespace=com.datastreaming.handler.AbstractSubscriber, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:082 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractSubscriber.onSubscribe' : 4bf92f3577b34da6a3ce929d0e0e4736 3768011a92ce78eb INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=onSubscribe, code.namespace=com.datastreaming.handler.AbstractSubscriber, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:082 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'JsonSubscriber.process' : 15c6c974c868a40550aca1a8d4dfd3d9 af7c3700a3c8f7f4 INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=process, code.namespace=com.datastreaming.subscriber.JsonSubscriber, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:083 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractSubscriber.onNext' : 15c6c974c868a40550aca1a8d4dfd3d9 4492207ecf3e6046 INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=onNext, code.namespace=com.datastreaming.handler.AbstractSubscriber, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:083 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'JsonSubscriber.requestComplete' : 375efd9ac1b4425f3680aea429e5c276 2bebaa74460c0230 INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=requestComplete, code.namespace=com.datastreaming.subscriber.JsonSubscriber, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:083 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'AbstractSubscriber.onComplete' : 375efd9ac1b4425f3680aea429e5c276 c85333542eeb06ed INTERNAL [tracer: io.opentelemetry.methods:2.7.0-alpha] AttributesMap{data={code.function=onComplete, code.namespace=com.datastreaming.handler.AbstractSubscriber, thread.name=ratpack-compute-1-3, thread.id=21}, capacity=128, totalAddedValues=4} [otel.javaagent 2024-09-25 08:19:31:084 +0530] [ratpack-compute-1-3] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'POST' : 4bf92f3577b34da6a3ce929d0e0e4736 569f349f5bd6f1a7 SERVER [tracer: io.opentelemetry.netty-4.1:2.7.0-alpha] AttributesMap{data={http.request.method=POST, url.path=/test, server.address=localhost, client.address=0:0:0:0:0:0:0:1, network.peer.address=0:0:0:0:0:0:0:1, network.peer.port=61465, network.protocol.version=1.1, user_agent.original=PostmanRuntime/7.29.2, server.port=5050, url.scheme=http, thread.name=ratpack-compute-1-3, http.response.status_code=200, thread.id=21}, capacity=128, totalAddedValues=13}
Note: I have manually passed the trace ID in the header and validated it, but it still generates different trace IDs. Header Name: traceparent value: 00-abcdef123456789abcdef123456789a1-0000000000000001-01
@NaveenRamu our ratpack instrumentation does not work for 2.x. I have created https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/12330 to fix this for ratpack 1.x. Currently we have no intention to support ratpack 2.x as the only available versionv2.0.0-rc-1
was released more than 2 years ago and there has been no work done since then. The 1.x line has more recent activity. If you need instrumentation for 2.x you could try copying the manual instrumentation library https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/ratpack/ratpack-1.7/library and fixing the imports for the classes that were moved to a different package in 2.x. See the tests on how to set up the manual instrumentation.
Describe the bug
When using the Subscriber interface with OpenTelemetry (Otel) tracing in a reactive environment, the tracing continuity breaks when the onNext() method is invoked multiple times (e.g., during message processing in chunks). The trace context is lost, leading to incomplete or broken traces. This is likely because the tracing context, which is stored in thread-local variables, is not being propagated correctly across asynchronous boundaries. and generate a new tracing ID for the next invoke.
Similarly when the onComplete() method is invoked at the end of the request new trace ID will be generated.
Implementation Class example: `public abstract class AbstractSubscriber implements Subscriber {
@OverRide public void onSubscribe(Subscription subscription) { logger.debug("onSubscribe"); this.subscription = subscription; subscription.request(1); }
@OverRide public void onNext(ByteBuf byteBuf) { logger.debug("onNext"); process(byteBuf); if (!isError) { byteBuf.release(); subscription.request(1); } else { byteBuf.release(); subscription.cancel(); ctx.error(cause); } }
@OverRide public void onError(Throwable throwable) { logger.debug("onError"); ctx.error(throwable); }
@OverRide public void onComplete() { requestComplete(); }
}`
Steps to reproduce
Steps to Reproduce:
Implement a Subscriber that processes messages in chunks by invoking onNext() multiple times. Set up OpenTelemetry tracing to trace the process. Notice that after multiple onNext() calls, the trace context is lost, leading to incomplete traces in the distributed trace logs and generating a new trace ID.
Expected behavior
The trace context should propagate correctly across multiple invocations of onNext(), maintaining continuity in the tracing logs.
Actual behavior
New thread ids generating for the RatPack override methods like onNext(). OnError(), onComplete().
Javaagent or library instrumentation version
otelcol version 0.106.1
Environment
JDK: 1.8 OS: CentOs Server: Ratpack (version: Ratpack 2.0.0-rc-1)
Additional context
The issue is likely due to asynchronous execution switching threads and losing the thread-local trace context. Reactive systems typically involve multiple threads, and if the context is not propagated, traces may appear incomplete or broken.