ntent / kafka4net

C# client for Kafka
Apache License 2.0
52 stars 32 forks source link

Huge memory and threads consumption #43

Closed sungam3r closed 6 years ago

sungam3r commented 6 years ago

We have service application which logs http requests/responses and some other stuff into kafka through serilog-sinks-kafka nuget package. Under the hood this package uses kafka4net. Also we have other in-process integrations with serilog/nlog logging frameworks to send messages to different targets/sinks.

After some time we noticed that application became to consume too much memory and threads. Turning off logs to kafka in config file normalizes situation and performance counters drop to previous low level as usually.

This picture shows memory/cpu level before turning on kafka (A) and after turning it off (B): default

ETW tracing shows that many threads wait on ManualResetEventSlim.Wait() in kafka4net.Producer.SendLoop:

"GCFrame", "HelperMethodFrame_1OBJ", "System.Threading.Monitor.Wait(System.Object, Int32, Boolean)", "System.Threading.Monitor.Wait(System.Object, Int32)", "System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)", "kafka4net.Producer.b__38_3()", "System.Threading.Tasks.Task.InnerInvoke()", "System.Threading.Tasks.Task.Execute()", "System.Threading.Tasks.Task.ExecutionContextCallback(System.Object)", "System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)", "System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)", "System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)", "System.Threading.Tasks.Task.ExecuteEntry(Boolean)", "System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()", "System.Threading.ThreadPoolWorkQueue.Dispatch()", "System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()", "DebuggerU2MCatchHandlerFrame"

Actually we see total 173 threads in running application and 63 threads waiting on ManualResetEventSlim in SendLoop.

Further investigation led me to this line of source code: https://github.com/ntent-ad/kafka4net/blob/master/src/Producer.cs#L366 And comment above this line makes one think that something important wasn't complete)

thunderstumpges commented 6 years ago

From what I recall of the design, and from a cursory look at the code, there is only ever one single thread per Producer that is in SendLoop() (each Producer instance has one sender thread). It is also normal for a producers send thread to wait on that event if there are no messages to send.

How many producers are being created by serilog-sinks-kafka? Can you verify the number of Producer instances? In general, you should share a single producer instance across the application (or at least major subsystem of the application). This may very well be an issue with serilog-sinks-kafka. We are using kafka4net heavily from IIS for logging many events, and see none of this issue.

thunderstumpges commented 6 years ago

a 5 minute look at serilog-sinks-kafka (I don't know serilog, or how many of these KafkaClient objects are created, but their method GetOrCreateKafkaProducerAsync does NOT perform double-check-locking and has a very possible race condition allowing many producers to be created.

https://github.com/wespday/serilog-sinks-kafka/blob/5b0d311fa3eb4e1d76a8a694cab99d9bed544d3f/src/Serilog.Sinks.Kafka/Sinks/Kafka/KafkaClient.cs#L98

sungam3r commented 6 years ago

Thank you for explanation! I will verify the number of Producer instances created by serilog-sinks-kafka. And just for the sake of interest what about infinite wait timeout?

sungam3r commented 6 years ago

Yep, double-check-locking in serilog-sinks-kafka source code in GetOrCreateKafkaProducerAsync with some other minor modifications helped to solve this problem. Thank you again.

thunderstumpges commented 6 years ago

Great Glad to hear it.