mgravell / Pipelines.Sockets.Unofficial

.NET managed sockets wrapper using the new "Pipelines" API
Other
419 stars 54 forks source link

High percentage of threads blocked on DedicatedThreadPoolPipeScheduler.RunWorkLoop #28

Open robertmujica opened 5 years ago

robertmujica commented 5 years ago

Hi,

we are having an Issue in prod where we see a high percentage / number of Threads in waiting status, and when looking at full Net call stack, it is pointing to "Pipelines.Sockets.Unofficial.DedicatedThreadPoolPipeScheduler.RunWorkLoop()+99 --"

We were using version 1.0.9 and I came across this article https://blog.marcgravell.com/2019/02/fun-with-spiral-of-death.html where you explain an issue you fixed on 1.1.*, which was very similar to what we were seeing in PROD so i went ahead and raised a PR and upgrading this NuGet to 2.0.7 and at same time I updated StackExchange.Redis from 2.0.519 to 2.0.571

However after deploying this new version and doing more testing we are still seeing high number of blocked threads of course way less than what we saw before but still happening.

this is call stack details, our app is running on Net472. if you need further details let me know.

[[GCFrame]]

[[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) Pipelines.Sockets.Unofficial.DedicatedThreadPoolPipeScheduler.RunWorkLoop()+99 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+163 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+14 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+52 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object)+5c [[GCFrame]] [[DebuggerU2MCatchHandlerFrame]] [[ContextTransitionFrame]] [[DebuggerU2MCatchHandlerFrame]]

ntdll!NtWaitForMultipleObjects+14

KERNELBASE!WaitForMultipleObjectsEx+f9 clr!WaitForMultipleObjectsEx_SO_TOLERANT+62 clr!Thread::DoAppropriateWaitWorker+1e4 clr!Thread::DoAppropriateWait+7d clr!CLREventBase::WaitEx+c4 clr!Thread::Block+27 clr!SyncBlock::Wait+19d [[GCFrame]] clr!ObjectNative::WaitTimeout+e1 [[HelperMethodFrame_1OBJ] (System.Threading.Monitor.ObjWait)] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object) Pipelines.Sockets.Unofficial.DedicatedThreadPoolPipeScheduler.RunWorkLoop()+99 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+163 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+14 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+52 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart(System.Object)+5c clr!CallDescrWorkerInternal+83 clr!CallDescrWorkerWithHandler+4e clr!MethodDescCallSite::CallTargetWorker+f8 clr!ThreadNative::KickOffThread_Worker+109 [[GCFrame]] clr!ManagedThreadBase_DispatchInner+39 clr!ManagedThreadBase_DispatchMiddle+6c clr!ManagedThreadBase_DispatchOuter+75 [[DebuggerU2MCatchHandlerFrame]] clr!ManagedThreadBase_DispatchInCorrectAD+15 clr!Thread::DoADCallBack+278 [[ContextTransitionFrame]] clr!ManagedThreadBase_DispatchInner+2fc3 clr!ManagedThreadBase_DispatchMiddle+6c clr!ManagedThreadBase_DispatchOuter+75 [[DebuggerU2MCatchHandlerFrame]] clr!ManagedThreadBase_FullTransitionWithAD+2f clr!ThreadNative::KickOffThread+db clr!Thread::intermediateThreadProc+86 kernel32!BaseThreadInitThunk+14 ntdll!RtlUserThreadStart+21

image

ikalafat commented 2 years ago

@NickCraver thank you for the additional explanation. The "problem" is that people (including myself) are seeing this kind of results in the profiler, where large amount of time is "taken" by RunWorkLoop samples, and are unknowingly being misled by those results.

I still haven't found a way to "exclude" specific namespaces from profiler results (for example in DotTrace), so that actual timings, and percentages, are "recalculated" with exclusions in mind, where we would have more precise results, which would allow better profiling of app itself.

pengweiqhca commented 11 months ago

Use SocketManager.ThreadPool rather than SocketManager.Shared.

var options = ConfigurationOptions.Parse(configuration);

options.SocketManager = SocketManager.ThreadPool;