grijjy / GrijjyFoundation

Foundation classes used by other Grijjy repositories
Other
247 stars 88 forks source link

Raw bytes of serialized protobuffers are not valid #30

Closed phillsonntag closed 4 years ago

phillsonntag commented 4 years ago

Hello, I'm currently trying to implement the ProtocolBuffer classes to serialize objects as protobuffers in Delphi. When comparing the byte results to the same protobuffer in .NET Core, I noticed that the raw bytes are not the same when serializing the same object in .NET Core.

Delphi

Person record

  TPerson = record
    [Serialize(1)] Id: Integer;
  end;

Raw bytes

SerializedData      (8, 242, 192, 1)
    [0] 8
    [1] 242
    [2] 192
    [3] 1

.NET Core 3.1 (C#)

Person class

    [ProtoContract]
    class Person
    {
        [ProtoMember(1)]
        public int Id { get; set; }
    }

Raw bytes

SerializedData      (8, 185, 96)
    [0] 8
    [1] 185
    [2] 96

Versions of the used libraries in .NET Core:

Conclusion

Due to differences in the raw byte structure of the serialized data of the object, it's not possible to serialize an object in Delphi and deserialize it in .NET Core. It would be nice if you could provide additional information about the version of the protobuffer standard you implemented, so I could make tests myself. The best option would be an update to the newest standard of protobuffers.

Thank you in advance,

Philipp

erikvanbilsen commented 4 years ago

Hi Philipp,

We implemented version 2 (proto2). The .NET library also supports this version.

The issue is with the way signed integers are encoded. Our version uses the efficient ZigZag encoding method for signed integers (corresponding to the sint32 proto type). This is because encoding negative integers using the int32 proto type is very inefficient (and always results in 10 bytes). (See https://developers.google.com/protocol-buffers/docs/proto and https://developers.google.com/protocol-buffers/docs/encoding).

It seems that the .NET library encodes regular (signed) integers using the (inefficient) int32 format instead of sint32.

These are two ways you could make the code compatible between Delphi and .NET:

  1. If you intend for the value to be always non-negative, then use a Cardinal (or UInt32) instead in the Delphi version:
  TPerson = record
    [Serialize(1)] Id: Cardinal;
  end;
  1. If the value can also take on negative values, then use DataFormat on the C# side to use the more efficient ZigZag encoding:
    [ProtoContract]
    class Person
    {
        [ProtoMember(1, DataFormat = DataFormat.ZigZag)]
        public int Id { get; set; }
    }

Note that our code is fully compliant with proto2 (as is the .NET library I assume). It is just that both libraries opted for a different default encoding for signed integers. I believe that the default encoding that we use is better than the default encoding used in .NET, because it results in (much) smaller data for signed integers.

So we have no plans to change our default encoding to match the .NET library. However, if this is a big problem, we may consider adding a parameter to the Serialize() attribute to indicate that signed integers should be encoded in int32 format instead of sint32.

Hope this helps.