rikimaru0345 / Ceras

Universal binary serializer for a wide variety of scenarios https://discord.gg/FGaCX4c
MIT License
484 stars 53 forks source link

Bug: Tolerance + PreserveReferences issue #89

Closed Vilo176 closed 3 years ago

Vilo176 commented 3 years ago

Hi Riki,

I have an issue with Serialization / Deserialization of my object hierarchy. My hierarchy is not very complex, but there are many references between objects.

When Ceras serializes the hierarchy :

When Deserializing, the same things append in the same order, so when Ceras encounter ID = 68, it is assured that this entry exists in the cache.

But, if between Serialization & Deserialization the hierarchy is modified, even slightly, so that objects order is modified, then at Deserialization, Ceras could encounter ID = 68 before ID = -3, causing a cache error (cache[68] could be another object or could not exist at all).

Of course to fix this issue I can store the old hierarchy in my code (with versioning) to Deserialize it, and make post adjustments to fit the new one ... But do you think this practical use case could be handled by Ceras itself ?

Thanks :-)

Vilo176 commented 3 years ago

To clarify a bit, lets have this very simple use case :

class A { public B objB; public C objC; } class B { public C objC; } class C { string tata; }

var c = new C(); var a = new A { objB = new B { objC = c, }, objC = c, }; File.WriteAllBytes("file.dat", Ceras.Serialize(a));

//-------------------------------------------------------

Inside "file.dat" : => "a.b.c" contains the fully serialized "c" because encountered 1rst => "a.c" contains the cache's ID of "c" because encountered 2nd

Then the application changes, class A becomes : class A { public C objC; public B objB; }

//-------------------------------------------------------

When deserializing, Ceras will encounter 1rst "a.c" that is only an ID, but that ID leads nowhere because "a.b.c" has not been deserialized yet.

rikimaru0345 commented 3 years ago

Hi, sorry for the delayed response.

Changing the order of members in a class shouldn't cause any issues. The members (SchemaMember) are sorted before they are used for reading/writing here: https://github.com/rikimaru0345/Ceras/blob/master/src/Ceras/Versioning/SchemaMemberComparer.cs

But maybe I'm missing something...

Vilo176 commented 3 years ago

Hi Riki, no prb ;-)

The issue is not related to member order inside a given class, but objects order inside a given "tree".

The way Ceras deserialize such a tree, i.e. with a List<> and not a Dictionary<>, implies that if you change member's order inside a class, and if that change modifies the actual order in which objects are encountered inside the tree, most of the time backward compatibility is broken (because of the sequential objects cache list created at deserialization time).

Changing this fact is, as far as I can tell, not easy inside Ceras, because you have to save more infos during serialization, and you cannot easily preserve the pure sequential serialization.

So the strategy I've taken is to :

  1. Flatten the objects tree inside 4 dictionaries (1 for types, 1 for objects, 1 for members, and 1 for primitives values).
  2. Serialize these dictionaries (inside a unique class) with Ceras.

The actual size of the result byte[] is 10% smaller than Ceras alone (mostly because static 4 dics structure allow heavy use of KnownTypes). Speed is 10-30% slower (Ceras part is faster because structure is very simple, but the 1rst stage take some time). But resilience becomes almost 100%.

Thank you for your time and awesome serializer Rikimaru.

rikimaru0345 commented 3 years ago

Hmm, wait a second, something is not right here....

The issue is not related to member order inside a given class, but objects order inside a given "tree".

But the way the tree is serialized depends on the order of the members.

You said:

Then the application changes, class A becomes : class A { public C objC; public B objB; }

So the order of the members objC and objB changes inside class A, but Ceras should still serialize and deserialize them in the exact same order, because it orders the members based on their name (and some other things).

Even if you serialize the changed version of class A, it should result in the exact same byte array.

In other words, it should not matter at all whether your class is defined as

class A { public B objB; public C objC; }

or as

class A { public C objC; public B objB; }

the resulting byte array should be completely the same.

This is just the "basic" version tolerance that is built-in (when you don't set any version tolerance setting in Ceras). But even though it doesn't embed the names of the fields, it should still be stable against reordering of members (since they're sorted by their name).

If you do use version tolerance , then Ceras also embeds the names of the fields so it can look them up to prevent any issues with members being added in the front or in the middle.

Vilo176 commented 3 years ago

I agree with you, the sample I gave (and you quote) was not relevant. But what appends if I remove or add a member (that is not of primitive type) ?

rikimaru0345 commented 3 years ago

Adding or removing members will always break the format if you don't use version tolerance.

It won't matter if it is a primitive type or not.

The best case scenario would be adding a new member, and if that new member happens to have a name that causes it to be sorted to the end, then Ceras would at least read the previous data correctly, but still throw an exception because it will try to read the last member (which doesn't exist in the binary data).

If you don't use version tolerance, the only thing the format is robust aginst, is reordering members. But renaming them can still cause issues since that can potentially change the order of the members.

Vilo176 commented 3 years ago

Your are mentioning "version tolerance", do you speak about Ceras's one, or about user specific mechanisms ?

rikimaru0345 commented 3 years ago

I was talking about Ceras' built in version tolerance configuration options. See here: https://github.com/rikimaru0345/Ceras/blob/master/src/Ceras/Config/SerializerConfig.cs#L114

Not sure what mechanisms a user could add to Ceras (using only OnConfigNewType, or ITypeBinder, or similar) that would result in a serialization that's as robust as the built in version tolerance features. Hmm.. maybe someone could at least handle renamed members somewhat (by fixing the names when Ceras is configuring the type and creating the schema for it...).

There is one alternative way I can think of though: When making any changes to a type, keep the old version of it around and put it into a different namespace, then when loading old data use those types to successfully deserialize the data, then instantiate the up-to-date versions of the classes and write some code to manually move the data into the new classes. Obviously that's a lot of work and pretty complicated 😂 but it is possible.

But that's all just hypothetical if Ceras' didn't have a built-in version tolerant mode. 😄

Vilo176 commented 3 years ago

Yes I'm aware of Ceras's tolerant mode, but for my own specific usage it is not enough, because I'm mainly adding/removing members between versions.

By the way thanks a lot Rikimaru for your answers.