ntent / kafka4net

C# client for Kafka
Apache License 2.0
52 stars 32 forks source link

Use RecyclableMemoryStream instead of MemoryStream #42

Open alexandrnikitin opened 7 years ago

alexandrnikitin commented 7 years ago

MemoryStream allocates new memory each time, in case of large messages it goes to LOH. An example code: https://github.com/ntent-ad/kafka4net/blob/a2c96aa987ddf3aa73446b02bfc0c1635a1a09b2/src/Protocols/Serializer.cs#L152

Proposed solution is Microsoft.IO.RecyclableMemoryStream https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream

WDYT?

thunderstumpges commented 7 years ago

We've sent and received quite a lot different size and rates of messages without noticing any measurable consequences. That being said, I can see the potential for benefit, and the RecyclableMemoryStream looks to be a drop-in replacement.

I don't have time to make this change myself right now, but if you want to submit a PR, I'll check it out and then integrate it.

-Thunder

vchekan commented 7 years ago

It is already optimized. MemoryStream does not do allocation, instead it just wraps byte array network buffer. https://github.com/ntent-ad/kafka4net/blob/48f702ef4e9162060a2f415d4008ad26b08b5288/src/Protocols/Serializer.cs#L361

In its turn, byte array is preallocated per tcp connection: https://github.com/ntent-ad/kafka4net/blob/master/src/Protocols/ResponseCorrelation.cs#L64

and can be reallocated if request is larger than current size: https://github.com/ntent-ad/kafka4net/blob/master/src/Protocols/ResponseCorrelation.cs#L86

All of this is possible because kafka protocol was designed with allocation overhead in mind. So all structures are prefixed with size fields, which allows to do only once allocation and avoid reallocations.

alexandrnikitin commented 7 years ago

Thank you for replies. @vchekan You pointed to deserialization logic. It's OK there. I meant serialization part, see link in the first post. Here's a couple of allocation profile results from a production application: perfview_2017-07-20_14-38-04 perfview_2017-07-20_14-38-37

vchekan commented 7 years ago

Ah, serialization. You are right. It does allocate for every message. Which makes me thinking, why it is not the same way compression buffers are allocated (thread-static).

Another thing to consider is caller's payload serialization. As driver accepts byte arrays only, caller will do their own memory allocation and then kafka4net driver will do copy from caller's payload buffer into its internal memory stream. This might be optimized too, but will require some callbacks and API changes/extensions. But as first step, I agree, pooled memory stream should be an improvement.