msgpack / msgpack-cli

MessagePack implementation for Common Language Infrastructure / msgpack.org[C#]
http://msgpack.org
Apache License 2.0
835 stars 175 forks source link

Serializer.Unpack Continuosly Stream of Data => High Memory Usage #356

Open rootfixxxer opened 1 year ago

rootfixxxer commented 1 year ago

I'm making some tests with this package and if I have a connection/stream that it's always sending data, the memory continues to raise until the system crashes.

Investigating the issue with a memory profiler I can see that it's always allocating this objects (there are others), and never disposing them:

There's something that I can do be able to use this library in the cases? Workaround?

Current usage:

var serializer = MessagePackSerializer.Get<MyType>();
while (!cancellationToken.IsCancellationRequested)
{
   var result = serializer.Unpack(myOpenStream);
   //Do nothing with the results, and the memory increases anyway
   result = null; //JIC makes no difference
}
yfakariya commented 1 year ago

Thank you for reporting, it looks a bug as you think. I will confirm few things to reproduce / investigate your condition:

Could you give me above info?

rootfixxxer commented 1 year ago

Hello

After some more tests I found out the problem it's with the volume of data present in the stream (Network Stream), I don't know if the handling of the data can be optimized by the library, but I have this as MyType:

  public class MsgPackArray
    {
        [MessagePackMember(0)]
        public IList<List<MessagePackObject>> ListContents { get; set; }
        [MessagePackMember(1)]
        public ulong Location { get; set; }
    }

By default, the List it's always a list with 3 items, first one it's a byte array with 16 bytes, the second one a Uint64 and the third one a MessagePackObjectDictionary, that has 4 keys and 4 values, all strings...

When I receive objects with more than 100k of items in the ListContents that's when the memory starts to raise, and it never drops...

Thanks

yfakariya commented 1 year ago

Sorry, I was busy to investigate this problem, but I want to know that 1)average size of each string keys and values in dictionary and 2) bit size of your process (32 bit or 64 bit). Because MessagePackObject must have both of byte array (un-decoded string) and string, so it requires double size for strings, and 100K sized dictionary with few kiro bytes string causes over 2GB memory size.

You can avoid this "over sized" behavior using POCO for the list because it always has three items, like following:

public class MsgPackArray
{
    [MessagePackMember(0)]
    public IList<MsgPackInnerArray> ListContents { get; set; }
    [MessagePackMember(1)]
    public ulong Location { get; set; }
}

public class MsgPackInnerArray
{
    [MessagePackMember(0)]
    public byte[] First { get; set; }
    [MessagePackMember(1)]
    public ulong Second { get; set; }
    [MessagePackMember(2)]
    public Dictionary<string, string> Third { get; set; }
}

This can deserialize following structure (represented as JSON for explanation):

[
  [ // begin ListContents
    0123456789ABCDEF0123456789ABCDEF,
    123, 
    {"A": "1", "B": "2", ... },
  ], // end ListContents
  456, // Location
]

Note that you still might face to OutOfMemoryException when you run in the 32bit process.

Or, if you want to reduce memory size anyway, you can use streaming processing using Unpacker directly.

using (var unpacker = Unpacker.Create(myStream))
{
    if (!unpacker.Read() || unpacker.LastReadData != 2 || !unpacker.IsArrayHeader) throw new Exception("Invalid input");

    var msgPackArrayUnpacker = unpacker.ReadSubtree();

    if (!msgPackArrayUnpacker.Read() || msgPackArrayUnpacker.LastReadData != 3 || !msgPackArrayUnpacker.IsArrayHeader) throw new Exception("Invalid input");

    var first = msgPackArrayUnpacker.ReadItemData().AsBinary();
    var second = msgPackArrayUnpacker.ReadItemData().AsUInt64();

    if (!msgPackArrayUnpacker.Read() || !msgPackArrayUnpacker.IsMapHeader) throw new Exception( "Invalid input" );

    var mapSize = msgPackArrayUnpacker.LastReadData.AsInt32();
    using (var mapUnpacker = msgPackArrayUnpacker.ReadSubtree())
    {
        for (var i = 0; i < mapSize; i++)
        {
            var key = mapUnpacker.ReadItemData().AsString();
            var value = mapUnpacker.ReadItemData().AsString();
            // Process key and value here...
        }
    }

    var location = msgPackArrayUnpacker.ReadItemData().AsUInt64();
}