vibe-d / vibe.d

Official vibe.d development
MIT License
1.15k stars 284 forks source link

(De)serialization of unions. #680

Open vuaru opened 10 years ago

vuaru commented 10 years ago

Is it really necessary to serialize ie. gfm's vec3ui(0, 1, 2) as follows?

{
    "_vec3ui": {
        "b": 2,
        "g": 1,
        "r": 0,
        "v": [
            0,
            1,
            2
        ],
        "x": 0,
        "y": 1,
        "z": 2
    }
}

And to require all those to be present in the json for deserialization as well?

s-ludwig commented 10 years ago

Definitely not what it should be. I don't think there is much that can be done from the serialization framework's POV, though. What I'd recommend is to add the following methods to Vector:

T[N] toRepresentation() const { return v; }
static Vector fromRepresentation(T[N] v) { return Vector(v); }

This should lead to this serialized result starting with vibe.d 0.7.20:

[
    0,
    1,
    2
]

This isn't yet really documented, but the trait that is checked is isCustomSerializable.

s-ludwig commented 10 years ago

Added documentation: 9ddd02329f403f815e68c29fbeed7b0cc48b69c7

vuaru commented 10 years ago

Ah, so it's as I feared. I had hoped not to have to do that.

Is it even possible to detect whether a variable is in a union via traits/__traits? But I guess that would still not solve the problem, since the vars in a union can be differently sized..

s-ludwig commented 10 years ago

Good question, at least for anonymous unions (or structs), I'm not really sure. A guess would be that is(Vector.tupleof[0] == union) can be used to detect it, but I didn't test that.

Yeah, but the main problem would still be that the serializer somehow would still have to choose between one of the several union fields. And since it's not really clear from the outside which of them is the "right" or "best" one, some form of manual guidance will always be required. BTW, Another alternative would be to use @ignore, but that would require including vibe.data.serialization:

// results in {"v": [0, 1, 2]}
struct Vector {
    union {
        T[N] v;
        struct { @ignore T r, g, b; }
    }
}

// results in {"r": 0, "g": 1, "b": 2}
struct Vector {
    union {
        @ignore T[N] v;
        struct { T r, g, b; }
    }
}
vuaru commented 10 years ago

Btw, the rgb are aliases for xyz. Why are they (de)serialized as well?

mihails-strasuns commented 10 years ago

Unions should actually always require custom serialization handler to be present or result in compile-time error. There is no type-safe way to serialize it without some intervention from the programmer by very definition of unions.

etcimon commented 10 years ago

There is no type-safe way to serialize it without some intervention from the programmer by very definition of unions.

There's usually an enum near that union to identify the chosen structure, preferably right before the union. That would be 0 for first union choice, in the order of the elements in the union.. I don't know if that could help generate a default behavior, with a @union(_name_) on top of the enum that helps identify it.

s-ludwig commented 10 years ago

@Dicebot: I tend to agree there, except maybe it would make sense to optionally allow disambiguation using an UDA or something. The only issue would be the actual detection of anonymous unions, I'll try the code above when I get some more time.

s-ludwig commented 10 years ago

@etcimon: Sounds doable, but in this particular context it seems like less overall complexity to just require toRepresentation or similar. But I could imagine that there are more uses for such an annotation. MIDL files for example have such tagged union annotations and are heavily used for the WinRT API to generate nice language specific API layers.

etcimon commented 10 years ago

MIDL files for example have such tagged union annotations and are heavily used for the WinRT API to generate nice language specific API layers.

I'm pulling this off the ASN.1 standard, the tags are also automatic, start from 0, placed before the union, and in the order of the choice elements. However, you can manually set them and they may skip from 3 to 5 so it's really an ordering constraint more than an accurate "tag # = position #"

mihails-strasuns commented 10 years ago

Yeah using some sort of UDA mapping to enum can work fine, as long as you don't try to serialize some other random unions :)

s-ludwig commented 10 years ago

Btw, the rgb are aliases for xyz. Why are they (de)serialized as well?

They are picked up by __traits(allMembers), which is used to iterate over the struct members. Not sure if it's possible to detect the difference.

mihails-strasuns commented 10 years ago

It makes sense to use .tupleof for serialization purposes instead of __traits(allMembers).

s-ludwig commented 10 years ago

Does that result in a tuple of aliases? I.e. can you do __traits(identifier, S.tupleof[0])? But apart from plain fields it also has to look at properties and some methods, so it's quite natural to just iterate over all members.

s-ludwig commented 10 years ago

Okay, after a quick test: Yes, it's a tuple of aliases, which is good.

s-ludwig commented 10 years ago

However, unfortunately:

struct S {
    union {
        int x;
        float y;
    }
}
pragma(msg, typeof(S.tupleof[0]));

result: "int"

etcimon commented 10 years ago

I'm in the middle of deciding how to handle this for ASN.1 serialization/deserialization, and found that it would be practical to have @when(id_field, value_match) UDA for union members - essentially, it's like a @ComponentRelationConstraint(identifierField, identifierValue) but that would be more verbose.

struct S {
    int id;
    union {
        @when("id", 0)
        int x;
        @when("id", 1)
        float y;
    }
}

Coincidentally, an UDA helps identify the members are part of a union.

pragma(msg, typeof(S.tupleof)); // (int, int, float)
pragma(msg, __traits(getAttributes, S.tupleof[1])); // tuple(when("id", 0))
s-ludwig commented 10 years ago

On the surface that looks good and flexible to me. The only thing I'd change (even if that unfortunately increases the verbosity a little bit) would be to name it @serializeIf to make it clear that it is serialization related (in retrospect, I'm not really happy with the existing UDAs, too, in this regard).

Implementation-wise, one thing to think about would be how deserialization is handled. The easiest solution would be to simply ignore this attribute and assume that only the correct field is present in the serialized data (so that the union doesn't get overwritten with wrong data). Otherwise, the process would get a lot more complicated, because it may require lookahead during the deserialization process.

etcimon commented 10 years ago

The easiest solution would be to simply ignore this attribute and assume that only the correct field is present in the serialized data

If there aren't any indirections this would work but if not, there needs to be a ctfe mixin with if and else ifs for each @when attribute in the union, and the associated unionized type must be recursed through in the scope of the if, so that the type info can be deserialized further.

I'm having problems thinking through unions of tuples with strings and ints right now. The DER encoding for X509 certificates is really a beast.

s-ludwig commented 10 years ago

Hm, can you give a concrete example of that? I currently don't see how CTFE and mixin would fit in here. Currently everything works with static if/tuple foreach and recursion, and if possible, I'd like to keep it that way.

etcimon commented 10 years ago

I currently don't see how CTFE and mixin would fit in here.

I know it's possible with static if and foreach as well. The idea is mainly to recurse through the union types during deserialization depending on the serialized data and which type was chosen in it.

s-ludwig commented 10 years ago

Okay, I see the problem now - I was assuming a type annotated serialization format, such as JSON/XML, but raw binary data would of course not allow this kind of approach. In this case, it may be a good idea to restrict the UDA to reference only preceding fields. That should be sufficient for ASN.1, right?

etcimon commented 10 years ago

it may be a good idea to restrict the UDA to reference only preceding fields. That should be sufficient for ASN.1, right?

This is as standard as length-value formats. ASN.1 works with type-length-value formats (in this respective order), and of course if you restrict the UDA it will make it possible to deserialize ASN.1 structs and re-serialize them to json.

s-ludwig commented 10 years ago

Okay, so this would be just used to represent dynamically typed fields? Then I guess that's okay for now. Anyway, this restriction looks like it is necessary anyway, so that it's not possible to accidentally rule out certain serialization formats (e.g. there would be no way to read a "value-type" ordered field from a raw binary format anyway).

etcimon commented 10 years ago

Okay, so this would be just used to represent dynamically typed fields?

Yes, making the serialization engine compatible with most POD structures is the intention I guess, this seems like the last missing piece of the puzzle =)