smallrye / smallrye-reactive-messaging

SmallRye Reactive Messaging
http://www.smallrye.io/smallrye-reactive-messaging/
Apache License 2.0
237 stars 177 forks source link

Added tracing using opentelemetry for jms like kafka does. #2727

Closed dankristensen closed 3 days ago

dankristensen commented 1 month ago

JMS Connector did not have tracing capabilities like kafka did.

Here is a PR, that contains this functionality.

Please review and give me feedback on this

ozangunalp commented 4 weeks ago

@dankristensen I've fixed the tests and removed completely client id and group id from the JmsTrace, I didn't know how to fill them.

dankristensen commented 4 weeks ago

@ozangunalp That is great. I have time to look at this tommorow, I will try to get it running locally, and see that i actually get these telemetries shown correctly in jaeger.

ozangunalp commented 4 weeks ago

Perfect, thanks for letting me know!

ozangunalp commented 3 weeks ago

@dankristensen did you have any time to check this ?

dankristensen commented 2 weeks ago

@ozangunalp Sorry for not getting back to you. I got serious sick, but is back to work again now. I will try to build one, so i can verify if it works correctly

dankristensen commented 2 weeks ago

I have tried, and can see, that i MUST have this dependency available: smallrye-reactive-messaging-otel. This is because it is used in JmsOpenTelemetryInstrumenter. Is this as wanted? Apparently it looks like kafka has same issue. I can see from the documentation, that you must include this. But the error is a little confusing, because tracing is per default enabled, and therefore i get this error. iled to start application: java.lang.RuntimeException: Failed to start quarkus at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source) at io.quarkus.runtime.Application.start(Application.java:101) at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:119) at io.quarkus.runtime.Quarkus.run(Quarkus.java:71) at io.quarkus.runtime.Quarkus.run(Quarkus.java:44) at io.quarkus.runtime.Quarkus.run(Quarkus.java:124) at io.quarkus.runner.GeneratedMain.main(Unknown Source) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at io.quarkus.runner.bootstrap.StartupActionImpl$1.run(StartupActionImpl.java:116) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.NoClassDefFoundError: io/smallrye/reactive/messaging/tracing/TracingUtils at io.smallrye.reactive.messaging.jms.tracing.JmsOpenTelemetryInstrumenter.createForSource(JmsOpenTelemetryInstrumenter.java:29)

What do you think?

dankristensen commented 2 weeks ago

I can confirm, that i get traces into jaeger now. But i am a little in doubt about to to pass span's across jms messages. Do you have any suggestions for this?

ozangunalp commented 2 weeks ago

I can confirm, that i get traces into jaeger now. But i am a little in doubt about to to pass span's across jms messages. Do you have any suggestions for this?

@dankristensen indeed for trace propagation, we needed to make sure that trace information (traceparent) was set on the outgoing message properties. It was missing. I've fixed it.

I think this is good to go. Thanks for your contribution!

dankristensen commented 2 weeks ago

@ozangunalp i have just built a snapshot and tried it. It helped with the traceparent, but it is not ALL of them that is being propagated correctly. I now see a rest POST call, that is publishing a message, and then receiving the message again, in the same span. When recieving the message again, it is doing some operation, and then publishing a new message, which is not aggregated in the same span. Should that not be the case, with the change you made to traceparent, or do i need to adjust something in my app for this?

Furthermore i can see, that it would be good, if we could get the channel name as part of the tags we publish. I have looked a bit into this, but i cannot see an obvious place to handle this. Do you have any suggestions for this?

ozangunalp commented 2 weeks ago

i have just built a snapshot and tried it. It helped with the traceparent, but it is not ALL of them that is being propagated correctly. I now see a rest POST call, that is publishing a message, and then receiving the message again, in the same span. When recieving the message again, it is doing some operation, and then publishing a new message, which is not aggregated in the same span. Should that not be the case, with the change you made to traceparent, or do i need to adjust something in my app for this?

So the app does the following ? receive POST -> publish message -> JMS -> consume message -> publish message -> JMS

Furthermore i can see, that it would be good, if we could get the channel name as part of the tags we publish. I have looked a bit into this, but i cannot see an obvious place to handle this. Do you have any suggestions for this?

I don't think we do this for other connectors supporting tracing, if you can come up with a proposition that would be awesome.

dankristensen commented 2 weeks ago

That is correct @ozangunalp .

receive POST -> publish message -> JMS -> consume message -> publish message -> JMS

And the publish/consume is actually done multiple times in the same microservice, before leaving to another microservice. So i guess this should be in the same span, as the original receive POST. Do you not agree in this?

ozangunalp commented 2 weeks ago

@dankristensen Can you push your test to a repository so that I can take a look?

dankristensen commented 2 weeks ago

So you mean create a test in smallrye-reactive-messaging-jms module? Currently i only see this in our working code locally. But i will try to make a small reproducer application, so it is easier for you to troubleshoot.

Regarding the channel, i will try to discuss this with one of my colleagues, when he is back from vacation. Maybe we can find a solution for this together

dankristensen commented 1 week ago

@ozangunalp Here is a reproducer: https://github.com/dankristensen/srmj-trace-reproducer. Since my company is running behind a proxy, i have adopted build.gradle/settings.gradle and jaeger configuration to my best effort. Let me know if you have any issues. Follow the guide in README.md

ozangunalp commented 1 week ago

@dankristensen Thanks for the reproducer. Indeed, we need to make a little more effort to make the automatic tracing propagation work on Quarkus for the JMS connector.

But it changes completely how messages are dispatched: Most of the other connectors follow this pattern to dispatch each message on a distinct message Vert.x context. JMS doesn't adhere to that and dispatch messages on the poller thread pool.

Note that changing that may break some assumptions from app developers, but context propagation and different execution modes (event-loop, blocking, VT). will work better on Quarkus.

dankristensen commented 1 week ago

I can see, that you have pushed changes @ozangunalp. Do i need to try it again in a real world example?

ozangunalp commented 1 week ago

@dankristensen if you do that'll be plus yes!

ozangunalp commented 3 days ago

@dankristensen I am in favor of getting this in. wdyt ?

dankristensen commented 3 days ago

Give me an hour, i am currently testing in a real world scenario. Will get back to you @ozangunalp

dankristensen commented 3 days ago

I see a problem, that i cannot fix. In JmsOpenTelemetryInstrumenter line 38, we use MessageOperation, that is from an incubator package. Is it possible, to avoid using that one in any way?

dankristensen commented 3 days ago

I currently have a problem in my service, and do not get the multi propagation executed correctly. It could be a mistake on my end, since we have also done some migration to github of the service i am working on. I will get back to you tommorow, where i have found a solution for this. Hopefully that is ok @ozangunalp ?

ozangunalp commented 3 days ago

Thanks for looking to this. I am thinking that this is already fine for this iteration. We can merge this and iron out the edges on the next.

dankristensen commented 3 days ago

I can confirm, that the trace is now working across multiple publish / receive events. But i had to change something in my service, to get it working. In particular i could not use MutinyEmitter.sendAndAwait, but had to use MutinyEmitter.sendAndForget instead. I am using this in application.properties: quarkus.messaging.blocking.signatures.execution.mode=event-loop I also had to add @Blocking to some of the incoming messages

ozangunalp commented 2 days ago

I also had to add @Blocking to some of the incoming messages

That is expected, before JMS messages were dispatched on the poller worker thread pool. Now they are dispatched on the Vert.x context, whether in blocking worker pool or the Vert.x event-loop.

dankristensen commented 2 days ago

This is great @ozangunalp . Thanks alot for your cooperation with this PR. That is fantastic. Looking forward to the release.