Open thomasameisel opened 1 year ago
@thomasameisel Do you have the stack trace for all other threads at the time when Thread1 was waiting for the lock ?
@lalitb we don't have the stack trace for the other threads unfortunately
Thanks @thomasameisel the other thread stack would have given more insight of any deadlock situation or if other thread has invoked any LogManager operation which is taking too much of time.
@lalitb We are facing a similar lock issue on pauseTransmission
Thread 41: triggered +[ODWLogManager pauseTransmission] and is sitting in 1DS lock, probably for a long time.
Attaching the crash reports here.
report-2517258755120939999-2c2491df-77e7-4a01-9e0b-15b3ee6faef7.txt TeamSpaceApp 2-9-23, 1-24 PM.txt
On dispatch of Pause transmission request, this acquires the lock and waits for http request cancelation and never releases.
Thread 38 name: Dispatch queue: eventDispatchQueue
Thread 38:
0 libsystem_kernel.dylib 0x1c8cccdfc swtch_pri + 8
1 libsystem_pthread.dylib 0x1d943673c cthread_yield + 32
2 TeamSpaceApp 0x10838b5f8 Microsoft::Applications::Events::HttpClientManager::cancelAllRequests() + 44
3 TeamSpaceApp 0x1083def24 std::__1::__function::__func<Microsoft::Applications::Events::TelemetrySystem::TelemetrySystem(Microsoft::Applications::Events::ILogManager&, Microsoft::Applications::Events::IRuntimeConfig&, Microsoft::Applications::Events::IOfflineStorage&, Microsoft::Applications::Events::IHttpClient&, Microsoft::Applications::Events::ITaskDispatcher&, Microsoft::Applications::Events::IBandwidthController*, Microsoft::Applications::Events::LogSessionDataProvider&)::$_2, std::__1::allocator<Microsoft::Applications::Events::TelemetrySystem::TelemetrySystem(Microsoft::Applications::Events::ILogManager&, Microsoft::Applications::Events::IRuntimeConfig&, Microsoft::Applications::Events::IOfflineStorage&, Microsoft::Applications::Events::IHttpClient&, Microsoft::Applications::Events::ITaskDispatcher&, Microsoft::Applications::Events::IBandwidthController*, Microsoft::Applications::Events::LogSessionDataProvider&)::$_2>, bool ()>::operator()() + 60
4 TeamSpaceApp 0x1083a5afc Microsoft::Applications::Events::LogManagerImpl::PauseTransmission() + 128
5 TeamSpaceApp 0x1083bb80c Microsoft::Applications::Events::LogManagerBase<Microsoft::Applications::Events::ModuleLogConfiguration>::PauseTransmission() + 84
6 TeamSpaceApp 0x1083bb718 +[ODWLogManager pauseTransmission] + 20
7 TeamSpaceApp 0x10a1d9294 TSOneDSTelemetryLogManager.pauseTransmission() + 256
8 TeamSpaceApp 0x10a1d9348 @objc TSOneDSTelemetryLogManager.pauseTransmission() + 36
9 TeamSpaceApp 0x1091e16a4 __46-[AXPInstrumentationManager pauseTransmission]_block_invoke + 136
10 TeamSpaceApp 0x10c7bf56c 0x102b08000 + 164328812
11 libdispatch.dylib 0x1927cf460 _dispatch_call_block_and_release + 32
12 libdispatch.dylib 0x1927d0f88 _dispatch_client_callout + 20
13 libdispatch.dylib 0x1927d8640 _dispatch_lane_serial_drain + 672
14 libdispatch.dylib 0x1927d918c _dispatch_lane_invoke + 384
15 libdispatch.dylib 0x1927e3e10 _dispatch_workloop_worker_thread + 652
16 libsystem_pthread.dylib 0x1d9430df8 _pthread_wqthread + 288
17 libsystem_pthread.dylib 0x1d9430b98 start_wqthread + 8
@lalitb Any updates on this? Could you please prioritize this? Let me know if you need anything else. Here is another crash log. TeamSpaceApp 3-1-23, 1-44 PM.txt
@lalitb Any updates on this? We are hitting into this quite often. Could you please check on this
@nishchith-cp - Is it possible to get the stack trace of all other threads, not just the thread crashing with timeout. There is a deadlock scenario between threads, so the data would be helpful.
Already attached the crash log in the preview comment TeamSpaceApp.2-9-23.1-24.PM (1).txt
@lalitb Could you share an update on the same?
@lalitb here's a crash log with the PauseTransmission issue - report-2517068873866699999-59e56560-7cb3-4c83-9343-b9e8ff905328 (1).txt
From the call stack, I noticed the PauseTransmission function is synchronously waiting for the HTTP requests to complete. I'm curious on the need to wait for these network requests? By waiting on the requests to complete, PauseTransmission is also waiting to release the m_lock mutex which make other functions (ex. GetLogger) seem like they're hanging since they're waiting on that mutex.
Describe your environment. Describe any aspect of your environment relevant to the problem, including your SDK version, platform, OS version, etc. If you're reporting a problem with a specific version of a library in this repo, please check whether the problem has been fixed on main brach.
iOS platform, SDK version 3.6.187
Steps to reproduce. Describe exactly how to reproduce the error. Include a code sample if applicable.
Call ODWLogManager.ResumeTransmission. The issue was reported as happening on boot, but it is unclear if that is necessary.
What is the expected behavior? What did you expect to see?
ResumeTransmission executes successfully.
What is the actual behavior? What did you see instead?
ResumeTransmission waits for a lock to be released until the app is killed as non-responsive.
Additional context. Add any other context about the problem here.
Stack trace: