microsoft / ApplicationInsights-Java

Application Insights for Java
http://aka.ms/application-insights
Other
296 stars 199 forks source link

Dependency tree not available with Azure Blob + Queue storage + ServiceBus + MongoDB #2951

Closed abhikt48 closed 5 months ago

abhikt48 commented 1 year ago

We are using applicationinsights-agent:3.4.10 with MuleSooft Application. We tested with the below flow (HTTP+Mongo+MySql+HTTP+JMS+Logger) and we can see the dependency tree (End-To-End transaction details) in App Insight, which is working fine.

image

But when we tested with ServiceBus(JMS+AMQP) + Blob + Queue Storage + Mongo DB storage + ServiceBus, then we can't see End-To-End transaction details.

image

Used Library Version

  1. Blob - azure-storage-blob-12.20.1
  2. Queue - azure-storage-queue-12.7.0
  3. ServiceBus - qpid-jms-client-0.57.0
  4. MongoDB - mongo-java-driver-3.12.10

NOTE - All components (ServiceBus, HTTP, Blob, Queue, etc.) are getting called from a separate library where we are not passing any operation ID to link a call from one to another.

Expectation

We want to have a dependency tree that should represent End-To-End call.

Could you please suggest any option to link end-to-end transaction calls? Please let me know for any information.

trask commented 1 year ago

hi @abhikt48, can you post the end-to-end transaction details that you see currently in both the non-working and working scenarios? thx!

heyams commented 1 year ago

@abhikt48 can you please submit a sample app to https://github.com/microsoft/ApplicationInsights-Java-Repros where this issue can be reproducible? Thanks.

abhikt48 commented 1 year ago

@trask @heyams

Thanks for the quick feedback. Finally. I am able to replicate this issue. The whole issue is related to threading.

Background - We are using Mule ESB where

  1. Some endpoint (ServiceBus consumer) receives message in a separate thread and processing happens in a separate thread, such as the below flow image

  2. Some endpoint works in a single thread for all processing image

We are having an issue in first scenario where multiple threads involves. For replicating this issue, I created a sample class

  1. Run all operations in a single thread - https://github.com/abhikt48/test-applicationinsights-agent/blob/main/src/main/java/com/abhi/test/ai/agent/sbus/storage/TestAiWithMultiTransportsSameThread.java

Refer onMessage(..) method - After receiving the message, I am calling to Blob, Queue, and Mongo DB. In this scenario, dependency tree is working perfectly like below

Complete_Transaction
  1. Run operations in multiple threads - https://github.com/abhikt48/test-applicationinsights-agent/blob/main/src/main/java/com/abhi/test/ai/agent/sbus/storage/TestAiAgentWithMultiTransports.java

Refer onMessage(..) method - Where I am creating a new thread to upload Blob, send message on Queue and get a number of documents from Mongo DB, but the dependency tree is not coming together like below.

Expectations - Blob, Queue and Mongo DB transactions should come in a single transaction.

MongoDB Queue

Please refer https://github.com/abhikt48/test-applicationinsights-agent#readme to run sample classes

Please feel free to let me know for any information.

abhikt48 commented 1 year ago

Hi @heyams ,

Did you get a chance to replicate this issue? Please let me know for any information.

Regards, Abhishek Kumar

heyams commented 1 year ago

Will try it today. Thank you for reminding me.

heyams commented 1 year ago

@abhikt48 i ran into an exception, any ideas?: Exception in thread "main" javax.jms.JMSSecurityException: ****Attempted to perform an unauthorized operation.**** TrackingId:88d4b3b29bb24bcbbe80fd1bf862ad61_G5, SystemTracker:gateway7, Timestamp:2023-03-15T19:22:05 [condition = amqp:unauthorized-access] at org.apache.qpid.jms.provider.exceptions.ProviderSecurityException.toJMSException(ProviderSecurityException.java:41) at org.apache.qpid.jms.provider.exceptions.ProviderSecurityException.toJMSException(ProviderSecurityException.java:27) at org.apache.qpid.jms.exceptions.JmsExceptionSupport.create(JmsExceptionSupport.java:80) at org.apache.qpid.jms.exceptions.JmsExceptionSupport.create(JmsExceptionSupport.java:112) at org.apache.qpid.jms.JmsConnection.createResource(JmsConnection.java:698) at org.apache.qpid.jms.JmsMessageConsumer.<init>(JmsMessageConsumer.java:125) at org.apache.qpid.jms.JmsMessageConsumer.<init>(JmsMessageConsumer.java:82) at org.apache.qpid.jms.JmsSession.createConsumer(JmsSession.java:490) at org.apache.qpid.jms.JmsSession.createConsumer(JmsSession.java:478) at org.apache.qpid.jms.JmsSession.createConsumer(JmsSession.java:470) at **com.abhi.test.ai.agent.sbus.storage.TestAiAgentWithMultiTransports.registerSbusMsgListener(TestAiAgentWithMultiTransports.java:131)** at com.abhi.test.ai.agent.sbus.storage.TestAiAgentWithMultiTransports.main(TestAiAgentWithMultiTransports.java:64) Caused by: org.apache.qpid.jms.provider.exceptions.ProviderSecurityException: Attempted to perform an unauthorized operation. TrackingId:88d4b3b29bb24bcbbe80fd1bf862ad61_G5, SystemTracker:gateway7, Timestamp:2023-03-15T19:22:05 [condition = amqp:unauthorized-access] at org.apache.qpid.jms.provider.amqp.AmqpSupport.convertToNonFatalException(AmqpSupport.java:173) at org.apache.qpid.jms.provider.amqp.builders.AmqpResourceBuilder.getOpenAbortExceptionFromRemote(AmqpResourceBuilder.java:299) at org.apache.qpid.jms.provider.amqp.builders.AmqpResourceBuilder.handleClosed(AmqpResourceBuilder.java:185) at org.apache.qpid.jms.provider.amqp.builders.AmqpResourceBuilder.processRemoteClose(AmqpResourceBuilder.java:129) at org.apache.qpid.jms.provider.amqp.AmqpProvider.processUpdates(AmqpProvider.java:985) at org.apache.qpid.jms.provider.amqp.AmqpProvider.onData(AmqpProvider.java:871) at org.apache.qpid.jms.transports.netty.NettyTcpTransport$NettyTcpTransportHandler.channelRead0(NettyTcpTransport.java:563) at org.apache.qpid.jms.transports.netty.NettyTcpTransport$NettyTcpTransportHandler.channelRead0(NettyTcpTransport.java:556) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1534) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1283) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1330) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.base/java.lang.Thread.run(Thread.java:834)

heyams commented 1 year ago

I believe that I have all the resources setup successfully, blob, service bus, storage queue and mongo db running locally.

username = "RootManageSharedAccessKey", and password is using SBUS shared access primary key. Is this correct?

heyams commented 1 year ago

@abhikt48 I'm able to repro

image

abhikt48 commented 1 year ago

@heyams Yes correct, you are using right credential for ServiceBus.

Glad, you are able to reproduce this issue.

Looking forward for resolution..

heyams commented 1 year ago

@abhikt48
SpanData{spanContext=ImmutableSpanContext{traceId=6b43768a1e847ddd66aad79269ffc643, spanId=a84ac109a62bf358, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true}, parentSpanContext=ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}, resource=Resource{schemaUrl=https://opentelemetry.io/schemas/1.18.0, attributes={host.arch="amd64", host.name="REDACTED", os.description="Windows 11 10.0", os.type="windows", process.command_line="C:\dev\zulu17.40.19-ca-jdk17.0.6-win_x64\bin\java.exe -javaagent:agent/applicationinsights-agent-3.4.11-SNAPSHOT.jar -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2022.2.3\lib\idea_rt.jar=52891:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2022.2.3\bin -Dfile.encoding=UTF-8", process.executable.path="C:\dev\zulu17.40.19-ca-jdk17.0.6-win_x64\bin\java.exe", process.pid=31392, process.runtime.description="Azul Systems, Inc. OpenJDK 64-Bit Server VM 17.0.6+10-LTS", process.runtime.name="OpenJDK Runtime Environment", process.runtime.version="17.0.6+10-LTS", service.name="unknown_service:java", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.24.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.mongo-3.1, version=1.24.0-alpha, schemaUrl=null, attributes={}}, name=aggregate ipt-test.stage1, kind=CLIENT, startEpochNanos=1679093603422608400, endEpochNanos=1679093603428358000, attributes=AttributesMap{data={db.statement={"aggregate": "stage1", "pipeline": [{"$match": {}}, {"$group": {"_id": "?", "n": {"$sum": "?"}}}], "cursor": {}, "$db": "?", "lsid": {"id": "?"}}, thread.id=51, db.system=mongodb, db.connection_string=mongodb://localhost:27017, net.peer.name=localhost, applicationinsights.internal.item_count=1, thread.name=Thread-20, net.peer.port=27017, db.name=ipt-test, db.operation=aggregate, db.mongodb.collection=stage1}, capacity=128, totalAddedValues=11}, totalAttributeCount=11, events=[], totalRecordedEvents=0, links=[], totalRecordedLinks=0, status=ImmutableStatusData{statusCode=UNSET, description=}, hasEnded=true}

Looks like the parentSpanId is invalid (**0000000000000000**), that's why distributed tracing is broken. I need to follow up with OpenTelemetry java instrumentation maintainer to find out more. I tested the latest OpenTelemetry 1.24.0, same outcome.

heyams commented 1 year ago

@abhikt48

You can achieve this by manually propagating context from the main thread to the new thread:

In your pom.xml, add this dependency:

<dependency>
            <groupId>io.opentelemetry</groupId>
            <artifactId>opentelemetry-api</artifactId>
            <version>1.24.0</version>
</dependency>

And then add the following TestAiAgentWithMultiTransports.java:

io.opentelemetry.context.Context context = io.opentelemetry.context.Context.current();
            Thread t1 = new Thread(new Runnable() {
                @Override
                public void run() {
                    try (Scope scope = context.makeCurrent()) {
                        uploadBlob(body);
                        sendMsgToQueueStorage(body);
                        countMongoDbDocuments();
                    }
                }
            });
            t1.start();

And then you will get what you need:

image

Please let me know the outcome. Thanks!

abhikt48 commented 1 year ago

Hi @heyams,

Thank you so much for providing the details. I did testing based on your code change suggestion and I can see dependency tree as well.

Unfortunately, this will not solve our problem. Because,

  1. we are facing this issue in MuleSoft ESB, where all components belong to a individual (MuleSoft) library
  2. Where we can't do modification, such as setting context in parent thread and access context in child thread

As per below image, we are receiving a message from ServiceBus (Thread T1) where we can't set context. We have control on child thread T2, where I can set context but that is not working as expected.

OpenTel_Context

I did testing with Java as well where I set context in child thread instead of parent thread but that is not working.

image

As you have rightly said that we are loosing parentSpanId - I printed context before and after switching the thread and realize that context is getting lost.

** contextBefore 'AgentContextWrapper{agentContext={opentelemetry-trace-span-key=SdkSpan{traceId=9bbb1*****}'
** contextAfter 'AgentContextWrapper{agentContext={}, applicationContext={}}'

Could you please advice something else which could solve this issue? Or worth to raise defect to opentelemetry? Please do the needful.

Regards, Abhishek Kumar

heyams commented 1 year ago

@abhikt48 Can you reach out to ServiceBus to propagate the context for your app? @trask might have better insights for the next step. It's not really a defect of OpenTelemetry as far as I understand.

abhikt48 commented 1 year ago

@heyams - I shared here only one example of a source which is ServiceBus. As part of our organization, we have more than 120+ applications where we have different-different sources such as FTP, ServiceBus, HTTP, Azure Storage, JMS, etc.

It is not feasible for us to modify all sources.

Can you please suggest -

  1. Which API is responsible for transferring the context from one Thread to another? applicationinsights-agent or opentelemetry-api?
  2. agentContextshould be automatically transferred between threads - Is this the correct expectation?
  3. If this feature is not there then can you please provide this feature? If not default, then provide some option to pass a custom argument that enables this feature.

We have a very high usage of AI, this will be a very great feature for us. Please do the needful.

FYI - @trask

heyams commented 1 year ago

image

If you use image instead of MuleSoft, telemetry is auto collected by our agent. More details here

abhikt48 commented 1 year ago

@heyams Thanks for the feedback. I will do the testing with above SDKs and update feedback. I am on leave for next week, so i will update testing results after 3rd April.

Do we have any option where we can initiate completely new agentContext for child thread and bind all transaction of child thread?

image
abhikt48 commented 1 year ago

@heyams - Apologies for the late feedback as I was on long leave.

I did testing with SBUS SDK - azure-messaging-servicebus:7.13.0 and it is not working as expected. I can't see dependency tree with all other calls.

Sample class to replicate this issue- TestAiWithLatestSbusSdk.java

Could you please try to replicate this issue?

Can I request you to provide this feature for all clients not only with latest Azure Libraries ? As you know AI is very widely used, so it will be very good feature.

Please let me know for any information.

Regards, Abhishek Kumar

abhikt48 commented 1 year ago

Hi @heyams - Did you get a chance to take a look at the above issue?

heyams commented 1 year ago

@abhikt48 will look into it. thanks.

heyams commented 1 year ago

@abhikt48 i'm seeing exceptions in the log (using servicebus 7.13.0)

java.lang.NoClassDefFoundError: com/azure/core/amqp/implementation/handler/ReceiverUnsettledDeliveries at com.azure.messaging.servicebus.implementation.ServiceBusReactorReceiver.(ServiceBusReactorReceiver.java:80) at com.azure.messaging.servicebus.implementation.ServiceBusReactorSession.createConsumer(ServiceBusReactorSession.java:170) at com.azure.core.amqp.implementation.ReactorSession.getSubscription(ReactorSession.java:618) at com.azure.core.amqp.implementation.ReactorSession.lambda$createConsumer$10(ReactorSession.java:398) at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916) at com.azure.core.amqp.implementation.ReactorSession.lambda$createConsumer$11(ReactorSession.java:382) at com.azure.core.amqp.implementation.handler.DispatchHandler.onTimerTask(DispatchHandler.java:32) at com.azure.core.amqp.implementation.ReactorDispatcher$WorkScheduler.run(ReactorDispatcher.java:207) at org.apache.qpid.proton.reactor.impl.SelectableImpl.readable(SelectableImpl.java:118) at org.apache.qpid.proton.reactor.impl.IOHandler.handleQuiesced(IOHandler.java:61) at org.apache.qpid.proton.reactor.impl.IOHandler.onUnhandled(IOHandler.java:390) at org.apache.qpid.proton.engine.BaseHandler.onReactorQuiesced(BaseHandler.java:87) at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:206) at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108) at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324) at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:292) at com.azure.core.amqp.implementation.ReactorExecutor.run(ReactorExecutor.java:91) at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: java.lang.ClassNotFoundException: com.azure.core.amqp.implementation.handler.ReceiverUnsettledDeliveries at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) ... 24 more

abhikt48 commented 5 months ago

Closing this issue for now, as we are focusing on Mule 4 with codeless agent instead of Mule 3.

Thank you !