Closed besseb05 closed 1 year ago
@mic-max Could you investigate this?
@mic-max Could you investigate this?
Yes Cijo, looking into it now
Update: I do believe this is a threading bug in the EventCounters instrumentation. Trying to reproduce it with coyote
to confirm and create a failing test which can be fixed.
When instantiating an EventListener, the callbacks to OnEventSourceCreated and OnEventWritten can happen before the constructor has completed. Take care when you initialize instance members used in those callbacks.
Source: https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventlistener?view=net-7.0
@besseb05: Are we able to pin down whether it is the lock in the below screenshot from EnableEvents that deadlocked with https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/7fda8d43b3d2c7a3f314df88063b28b4ea2078bb/src/OpenTelemetry.Instrumentation.EventCounters/EventCountersMetrics.cs#LL95-L111C10?
Another clarification question I have, we are using EvenEventCountersMetrics as a Singleton in this case, right?
Hi @Yun-Ting, I am not sure if it is that lock... the memory dump that we had is no longer available and I didn't take note of it when I looked at it before, sorry
Another clarification question I have, we are using EvenEventCountersMetrics as a Singleton in this case, right?
I am not sure about this... I shared the initialization we are doing in our service
@besseb05, no worries. Let me know if you happen to hit the issue again and get the dump. Thanks!
Hi @Yun-Ting, we ran into the issue again. I believe that that is the lock causing the deadlock but I am not very experienced analyzing memory dumps. The callstack in the dump definitely shows that the thread is waiting on a lock in that function
I believe that that is the lock causing the deadlock but I am not very experienced analyzing memory dumps.
Hi @besseb05, would you mind sharing the dump file here so I can take a look?
The callstack in the dump definitely shows that the thread is waiting on a lock in that function
Do you mean that the deadlock situation was happening for these 2 locks? https://github.com/open-telemetry/opentelemetry-dotnet-contrib/issues/1024#issuecomment-1464561184
I believe that that is the lock causing the deadlock but I am not very experienced analyzing memory dumps.
@Yun-Ting I don't think I am able to share the dump, sorry
Do you mean that the deadlock situation was happening for these 2 locks? https://github.com/open-telemetry/opentelemetry-dotnet-contrib/issues/1024#issuecomment-1464561184
I think so. Were you able to replicate that deadlock on your side?
@Yun-Ting do you have any updates?
Hi @Yun-Ting do you have any updates?
Hi @besseb05, the mitigation was merged and released. https://github.com/open-telemetry/opentelemetry-dotnet-contrib/releases/tag/Instrumentation.EventCounters-1.5.1-alpha.1.
Issue with OpenTelemetry.Instrumentation.EventCounters
List of all OpenTelemetry NuGet packages and version that you are using (e.g.
OpenTelemetry 1.0.2
):Runtime version: net7.0
Is this a feature request or a bug?
What is the expected behavior?
When we are instrumenting our code, we expect this code for instrumenting the MeterProvider is executed when the ServiceFabric service settings are created. The code should execute and return normally, so that the service can start up and start serving calls.
What is the actual behavior? We have come across an instance where, according to the memory dump we took of a process which was stuck, there was a deadlock between two threads while running this instrumentation code. I'm sharing the stack traces below:
We have only observed this behavior once so far, so it's probably a rare race condition. We were following this documentation as guidance. Could it be related to creating the EventSource before we build the MeterProvider?
Additional Context
Add any other context about the feature request here.