invertedtomato / lightweight-serialization

LightWeight is a serialization algorithm focusing on producing the smallest possible output.
MIT License
1 stars 0 forks source link

Pool MemoryStream #1

Open dzmitry-lahoda opened 5 years ago

dzmitry-lahoda commented 5 years ago

https://github.com/invertedtomato/lightweight-serialization/blob/33a995aa8ef8cb5dd747839c3fae1262cfb602ac/Library/LightWeightSerialization/UnsignedVlq.cs#L30

invertedtomato commented 5 years ago

Interesting idea, though I recently had another perspective on this by @vpenades and I've swapped this for a List. Have a look now. https://github.com/invertedtomato/lightweight-serialization/blob/master/Library/LightWeightSerialization/UnsignedVlq.cs#L30 List's CPU performance is better in this instance.

vpenades commented 5 years ago

I think he is referring to pooling the internal array itself.

Regarding MemoryStream, I did an additional research and I found that after all, MemoryStream is not that slow, at least on Net Core.

It seems MemoryStream has been completely refurbished in net core; you can compare how different they are in NetFramework and NetCore.

In essence, both List and MemoryStream work in the same way; internally they grow a Byte[] array as the user writes to it. The additional overhead of MemoryStream comes from overriding System.IO.Stream.

But by using a Byte[] array it means that the array is reallocated and copied when it grows, so the "old" array needs to be garbage collected, which adds some overhead too.

Over time there's a number of exotic solutions for this issue, for example:

There's RecyclableMemoryStream from Microsoft, which is what I think @dzmitry-lahoda is refering to.

Then, there's also System.Buffers.ArrayPool which allows to create and recycle Byte[] arrays, preventing the need to garbage collect old pools, there's an explanation here.

dzmitry-lahoda commented 5 years ago

Yep, to state yet another way, using lightweight-serialization would be considered by people if it declared (and proved by BenchmarkDotNet) low GC. The way is use pooling, even for a list, like https://github.com/jtmueller/Collections.Pooled

invertedtomato commented 5 years ago

I really appreciate your thoughts guys. Realizing that array pooling isn't necessarily more efficient for small arrays has slowed my thinking. I've taken a few minor steps inspired by this discussion that have improved performance:

1) For VLQ encoding, using a Byte array as a buffer wrapped in an ArraySegment. The buffer isn't trimmed until the last moment resulting in a single copy. https://github.com/invertedtomato/lightweight-serialization/blob/master/Library/LightWeightSerialization/UnsignedVlq.cs#L32 2) Converted Nodes to structs. https://github.com/invertedtomato/lightweight-serialization/blob/master/Library/LightWeightSerialization/Node.cs Interestingly this resulted in a ballpark 14% speed improvement in my basic testing scenario. I take it that this is saving GC quite a bit of work.

I'll ponder this further a give more thoughts when they come shortly.

vpenades commented 5 years ago

@invertedtomato You can probably replace

ArraySegment<Byte>[] EncodeCache = new ArraySegment<Byte>[255];

with

ArrayPool<Byte> EncodeCache = ArrayPool<Byte>.Create();

then you can use ArrayPool.Rent and ArrayPool.Return , which happens to work with ArraySegment too

invertedtomato commented 5 years ago

In that particular case buffers are used in multiple locations concurrently. For example if the value 3 is VLQ encoded, the buffer containing the encoded equivalent is used anywhere the value 3 is used for the whole of the runtime session. So while Renting is possible, Return doesn't make sense. Also the buffers are small (10 bytes) and will have negligible performance advantage.

I am pondering swapping the Streams throughout for ArraySegments because the lengths are now largely known in advance, and it would save a stack of double copying.

On Thu, 28 Feb 2019 at 21:45, Vicente Penades notifications@github.com wrote:

@invertedtomato https://github.com/invertedtomato You can probably replace

ArraySegment[] EncodeCache = new ArraySegment[255];

with

ArrayPool EncodeCache = ArrayPool.Create();

then you can use ArrayPool.Rent and ArrayPool.Return , which happens to work with ArraySegment too

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/invertedtomato/lightweight-serialization/issues/1#issuecomment-468243383, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq6Oe72pM2hPLG7QJRIov05w8-QiFooks5vR8FfgaJpZM4bVR5J .

--

*Ben *Thompson

+61 4 1121 5410