wc-duck / datalibrary

Open Source Data Library for data serialization.
Other
42 stars 8 forks source link

Generate typeid from "the entire struct" instead of just type-name. #139

Open wc-duck opened 2 years ago

wc-duck commented 2 years ago

Today the typeid of a type is only generated from the types name and nothing else (see https://github.com/wc-duck/datalibrary/blob/master/src/dl_typelib_read_txt.cpp#L1125)

This will lead to problems when, in the future, type upgrades/conversions need to be supported as 2 types would be considered the same even if their members is not the same!

Typeid should be generated from name + member-names + member-types to generate an unique typeid even if the name is the same.

Question: are 2 types the same if name mismatch but the actual struct is equal?

Tisten commented 2 years ago

Answer: Since dl_reflect_get_type_info() returns the type name among other things it ought to be a conflict to have two "identical" structs with mismatching names. So yes, I think the name should be part of the type hash.

wc-duck commented 1 year ago

One thing here, it might be an idea to add a "version" field to the type, just an integer that can be used to identify different versions of the same type in libs and they could all be stored in the same typelib. Only generating headers for "the latest" type and always defaulting to the latest when packing would keep the old behavior but still keep types for backwards compat around!

wc-duck commented 1 year ago

.... just need to have a think on how to handle versions for when subtypes upgrade their version!

wc-duck commented 1 year ago

Maybe it is as simple as being able to specify version/revision on members as well?

Tisten commented 1 year ago

Not sure I follow your thought here. If every type have a version, then that implies that every subtype have a version. So there is no need to have a version per member, since there already is one from the member's type, right? Or what is the purpose of these per-member versions?

Using versions to solve the "data upgrade path" is interesting though, since a "double upgrade" where at least one upgrade step is complex then need to run "both migrations" and in the right order to get correct data. And then each data upgrade might need to be ordered, instead of having a version number per type.

As an example, say we have 2 types: struct Rot{ float quat[4]; }; struct Entity{ float x; float y; float z; Rot rot; };

Now let's say we want to upgrade these two types by storing rotations as euler, and by keeping the rotation of an entity as quats, but embed the floats into the struct instead of having a substruct, so the new structs are: struct Rot{ float euler[3]; }; struct Entity{ float x; float y; float z; float qx; float qy; float qz; float qw; };

In this case it will matter which order the data upgrades are made in.

I remember double upgrades being handled by a save system we both worked on in the past, but a little differently than with per type version numbers. And it was also solved in the machinery which I used last, but not at all by manual version numbers there. They had a centralized type schema where all active types were known, and thus could order migrations of the types more explicitly.

wc-duck commented 1 year ago

I'm would guess that you would have to be able to specify what type a member uses with a version as well, reason being.

Typelibs are stored in text and have no notion of what type you use except for name, so if you have a member specified as having type "my_type", if "my_type" is updated there is no data in there on what type you actually mean.

If we only worked with binary typelibs the problem wouldn't be there as the version would be part of the typeid, but not in text.

IMHO a version number is a very low-tech solution... and low-tech is usually easy to get right and reason about. I might be a bit concerned about the ease of "getting it wrong"... but I think that i manageable.

If/when I get the time I'll write up a proposal.

Tisten commented 1 year ago

I like low tech when it is easy to verify with code. But I don't trust myself, so if you ask me the "set the right number" and I could set it off by one without a compile error or mandatory static code analysis saying that I'm wrong, then I find low tech which rely on humans to be hard to reason about. Anything which is both low tech enough to understand by explanation or reading code, and which uses validation or automation (minimum extra annotation/configuration by humans) gets my vote.

When you say If we only worked with binary typelibs do you then imply that, even with this newly proposed typeid from the entire struct, we could end up with an ambiguous interpretation of how the data was saved just because the type library is declared using text? Or is the version number meant to add "an extra layer of safety" when purely using text based reflection?

I think that "per member version number" could be said to exist in Cap'n'proto, see https://capnproto.org/language.html with the @0 etc. Have you seen anywhere else where this version number design is used sucessfully? Or do you have any experience from other data libraries where the lack of that extra version number have caused practical issues?

wc-duck commented 1 year ago

I see that version is part of the data that typeid is generated from, so data in binary form is always saved with a specific type.

Version here would be really close to what capnproto has, but only for the full struct. The problem I see with that implementation is that you can't change type of a member if I haven't missed anything. Also I don't see how that schema would support removing a member?

But it do lead to less code than what I had in mind.

I think I need to write down how I see all of these separate systems fit together.

wc-duck commented 1 year ago

... ie how I see the workflow for modifying a type and the steps that would involve!