jakartaee / jsonb-api

Jakarta JSON Binding
https://eclipse-ee4j.github.io/jsonb-api/
Other
79 stars 39 forks source link

Polymorphic [de]serialization #147

Closed mdzaebel closed 2 years ago

mdzaebel commented 5 years ago

I'd like to repeat the feature proposal from https://github.com/eclipse-ee4j/yasson/issues/133 here. There should at least be a description of how to implement Serializer/Deserializer that maintain runtime types. As there are more possible strategies, they should be defined too.

amoscatelli commented 3 years ago

@rmannibucau

At the current state of things they are not duplicate. With the current state of serializer/deserializer I can't fallback to the usual serializing or deserializing. For example, a deserializer parsing the type information from the JSON must then handle the complete deserialization. If you fallback using DeserializationContext inside the deserialize method Jsonb could (and actual Yasson will) loop since the same deserializer will be called.

We don't want to turn the json into a bean, we want to determine the type for JsonB in an abstract way before the deserialization actual take place.

public interface JsonbDeserializer<T> {
    T deserialize(JsonParser parser, DeserializationContext ctx, Type rtType);
}

This isn't just a 'particular case'. Every java bean has a type.

rmannibucau commented 3 years ago

With the current state of serializer/deserializer I can't fallback to the usual serializing or deserializing.

Hmm the DeserializationContext is there for that with the advantage to prevent loops whereas your api enables infinite loops (since they are equivalent they should be treated the same I assume and have the same impl constraints?). I suspect that if Yasson does not support it it must be fixed since it makes deserializer unusable. But overall the API is a particular impl of a serializer (you just add the type name and optionally a wrapper) so I don't think it is worth adding it (adding a @JsonbTypeId enables to not implement any new class which can be worth comparatively to be clear).

Side note: yasson shows polymorphism example in jsonb doc so I guess it works: https://javaee.github.io/jsonb-spec/docs/user-guide.html#serializers-deserializers.

To precise more my thinking, if you think there are cases where the yasson deserialization behavior is right - we discussed it at johnzon and thought it was always wrong to enable nested loops so we disable it by default IIRC - we can enhance the deserialization context to enable or not the loop. I prefer to refine this API rather than adding another API which mimics it and will have the same pitfalls later.

Last 2 notes are that:

  1. readType/readValue has a pitfall: you must enforce readType to be called first and readValue after which does not enable to not wrap the value in an object whereas deserializer does until you pass the same JsonStructure which is then equivalent to readStructure(JsonStructure) and therefore a deserializer.
  2. it breaks streaming whereas polymorphism can be streaming friendly (ie you get a json of 2G and only load in mem 2 string attributes, it happens and must be polymorphism friendly in terms of API I guess).

Hope it makes sense.

amoscatelli commented 3 years ago

https://javaee.github.io/jsonb-spec/docs/user-guide.html#serializers-deserializers

This actually works only with this kind of JSON structure :

{
    $typeinformation: {
        "attribute1": ...
        ..
        "attributeN": ...
    }
}

I understand that what I proposed is not streaming friendly. But you CAN'T assure streaming-friendliness if you want to leave the programmer the freedom of representing the type as he wants.

amoscatelli commented 3 years ago

@rmannibucau

There are clients that may (and actually don't) support that kind of JSON structure but they accept instead this kind of structure (since they don't have an equivalent readValue() method when deserializing) :

 {
     "__type": $typeinformation
     "attribute1": ...
     ..
     "attributeN": ...
 }

How can I use the DeserializationContext with this kind of structure ? I need to read inside the actual value/object json to read the type information. So If I pass the jsonParser to the DeserializationContext it will be in en invalid state (since the JsonParser event won't be at the beginning of the JSON)

rmannibucau commented 3 years ago

But you CAN'T assure streaming-friendliness if you want to leave the programmer the freedom of representing the type as he wants.

This is not true, anything which is done without streaming can be done with streaming, the opposite is not true so.

How can I use the DeserializationContext with this kind of structure ?

A trivial impl will read all but __type in a jsonobject (JsonParser.getObject or alternative) and then handle it as needed - which is your proposal but with the existing API.

So If I pass the jsonParser to the DeserializationContext it will be in en invalid state (since the JsonParser event won't be at the beginning of the JSON)

Shouldn't, context should handle it - otherwise there is no point to have such an API.

Once again my point is that if current API is broken we should fix it but not add a new API which is 1-1 as a workaround.

amoscatelli commented 3 years ago

@rmannibucau

Please help me to understand, maybe I am missing something. I can JsonParser.readObject() and read the __type attribute to infer the actual type. Then how can I pass the JsonParser to the DeserializationContext.deserialize(Class clazz, JsonParser parser) ?

In the DeserializationContext source code I read :

 * Deserialize JSON stream into instance of provided class using {@link javax.json.stream.JsonParser}.
 * JsonParser cursor have to be at KEY_NAME before START_OBJECT / START_ARRAY, or at START_OBJECT / START_ARRAY
 * to call this method. After deserialization is complete JsonParser will be at END_OBJECT / END_ARRAY
 * for deserialized JSON structure.

So I need a JsonParser with a cursor in the proper place ... What am I missing ?

rmannibucau commented 3 years ago

@amoscatelli JsonParserFactory#createParser(JsonObject)? There is already an optimized parser API to read from an in memory model.

amoscatelli commented 3 years ago

I understood you correctly. So you suggest to create a new parser starting from the complete json object. I already tried that some years ago but Yasson has a problem with that :

https://github.com/eclipse-ee4j/yasson/issues/102

There is a casting of JsonParser to their specific implementation :

deserializeInternal((JsonbParser) parser, ctx);

@bravehorsie states it's correct this way, but I don't think so

"What you do is you substitute running JSONP parser which has a cursor at some point of the document with new parser instance reading new document but you call current runtime on it, that is wrong. If you want to read a new document you should create new runtime with your parser instance"

So I really dunno what to do. I can't find a clean way to handle this.

:(

amoscatelli commented 3 years ago

@rmannibucau maybe we could expose the current runtime (Jsonb) configuration to the JsonbDeserializer so that we can create a new runtime inside the Deserializer ?

What do you suggest ? Am I missing something ?

rmannibucau commented 3 years ago

@amoscatelli I'd say we should work to make this work:

public class PolyTest {
    @Test
    public void run() throws Exception {
        try (final Jsonb jsonb = JsonbBuilder.create(new JsonbConfig()
                .withDeserializers(new PolyDeserializer()))) {
            final Parent parent = jsonb.fromJson("{\"type\":1,\"name\":\"first\",\"uno\":true,\"duo\":true}", Parent.class);
            assertEquals("Child1{name='first', uno=true}", parent.toString());
        }
    }

    public static class Parent {
        public String name;

        @Override
        public String toString() {
            return "Parent{name='" + name + "'}";
        }
    }

    public static class Child1 extends Parent {
        public boolean uno;

        @Override
        public String toString() {
            return "Child1{name='" + name + "', uno=" + uno + '}';
        }
    }

    public static class Child2 extends Parent {
        public boolean duo;

        @Override
        public String toString() {
            return "Child2{name='" + name + "', duo=" + duo + '}';
        }
    }

    public static class PolyDeserializer implements JsonbDeserializer<Parent> {
        @Override
        public Parent deserialize(final JsonParser parser,
                                  final DeserializationContext ctx,
                                  final Type rtType) {
            moveToType(parser);
            final int type = parser.getInt();
            switch (type) {
                case 1:
                    return ctx.deserialize(Child1.class, parser);
                case 2:
                    return ctx.deserialize(Child2.class, parser);
                default:
                    throw new IllegalArgumentException(String.valueOf(type));
            }
        }

        private void moveToType(final JsonParser parser) {
            ensureNext(parser, START_OBJECT);
            ensureNext(parser, KEY_NAME);
            if (!"type".equals(parser.getString())) {
                throw new IllegalArgumentException("Expected 'type' but got '" + parser.getString() + "'");
            }
            ensureNext(parser, VALUE_NUMBER);
        }

        private void ensureNext(final JsonParser parser, final JsonParser.Event expected) {
            final JsonParser.Event next = parser.next();
            if (expected != next) {
                throw new IllegalArgumentException(next + " != " + expected);
            }
        }
    }
}

Would it work for you?

amoscatelli commented 3 years ago

Going to try this right now. If this works, it's simply great. I didn't think I could have a pointer positioned on the second attribute KEY_NAME when starting deserializing.

rmannibucau commented 3 years ago

@amoscatelli I'm not 100% sure it works as of today since deserializers miss some behavior in all impl ATM but we can clearly spec it supporting this case. For example, next johnzon release will support it normally (master should already BTW) and it is just the alignment of jsonb API on the pre-jsonb behavior so sounds natural to me. That said point is more "do we agree to move to that direction instead of a new programmatic API" (I keep the declarative API apart for now).

On deserialization I would put these constraints in the spec:

  1. deserializers see the whole object (implementation is responsible to auto handle the skip/providing of missed events), it is the only way it is portable between object (write(key, value)) and arrays (write(value)) IMHO
  2. deserializers are disabled recursively for the same types (ie if I have a child1 deserializer it wil lbe used with the previous snippet ctx.deserialize(Child1.class, parser); but Parent deserializer is disabled for the same call. Similarly in Child1 deserializer, ctx.deserialize(Child1.class, parser); disables itself (accumulatively, ie if parent ders -> child1 deser -> here child1 AND parent deserializers are disabled).

2 should be simple to validate since it actually never works as of today if not done.

amoscatelli commented 3 years ago

Still coding your example to test.

I do agree that we can move to this direction instead of a new programmatic API, yet we have to pay attention to this :

Child1 MAY have a property of type Child1 (or Parent). So the Deserializer should be disabled only for current instance. In others words, you can't call twice the same deserializer for the same instance, but you MAY call it later while deserializing the instance properties. So this will handle recursive topologies of classes.

amoscatelli commented 3 years ago

Of course this solution is still a problem for any implementations/clients where type attribute is not the FIRST attribute in the JSON. I do know that having the type attribute as the first attribute is a GOOD THING for performance, and this exactly what I would do, yet the world out there is full of existing technologies/people/libraries who may disagree.

I am a bit concerned about this, I would be more assured if the spec would have a costraint stating/stressing out that DeserializationContext implementations CAN NOT assume any underlying JsonP implementation or, even more, that DeserializationContext can be invoked with a different JsonParser instance, different from the source one.

rmannibucau commented 3 years ago

@amoscatelli my 2cts would b that assuming type is first is a bug since most serializers will not (in particular in javascript land), this is why i prefer a explicit modelisation in an array but this is never requested in java land so let's enable these common cases and keep explicit mapping for the rest - this works very well and avoids too much "magic" which tends to be worse than having to write a dedicated model IMHO.

DeserializationContext implementations CAN NOT assume any underlying JsonP implementation or, even more, that DeserializationContext can be invoked with a different JsonParser instance, different from the source one.

It must be the case, if you find it is not then it can be seen as a bug.

amoscatelli commented 3 years ago

@rmannibucau Ok I just verified that Yasson doesn't work with our solution. First problem is that it loops. If you call ctx.deserialize(Child1.class, parser) the same deserializer will trigger. :(

rmannibucau commented 3 years ago

@amoscatelli that's what I thought so let's explicit it in the spec then we should be cleaner :)

amoscatelli commented 3 years ago

Ok, can we also be more explicit regarding the JsonParser instance in the spec ? I want to have something more stating that's a bug.

Also, I would like to give Johnzon a try (I am using Wildfly). Does Johnzon support generics, type variables, parameterizedtypes at the current state ?

rmannibucau commented 3 years ago

@amoscatelli agree for jsonparser but can be worth another issue/PR.

about johnzon: it has some typevariable support even if not 100% complete and parameterized type support yes.

Verdent commented 3 years ago

@amoscatelli I am currently rewriting and improving Yasson serialization and deserialization more or less from scratch and I am also adding support for polymorphism (so far implementation specific, but it will change later possibly) You will be able to use config and also annotations to enable polymorphism and customize it as desired, so you do not need to write anything extra by yourself and Yasson will do that for you. Hopefully I will be able to create the first RC soon.

rmannibucau commented 3 years ago

@Verdent we still need a portable solution since org.yasson internal imports are rarely desired when you use a spec which is primarly intended to write vendor independent apps. Can you ensure yasson works with previous code?

Verdent commented 3 years ago

@rmannibucau By previous code you mean code which users created before my rework? Absolutely, only classes from internal packages are changed/removed. And I am currently even passing all of the tests (without the need to change them) and TCKs so I would say it will not break any previously created user code.

I completely agree here that implementation specific imports are usually not what users what to have and as soon as it is described in the spec, it will be changed to support annotations and interfaces from the API. If you want, I can describe here in more details how I am implementing it there and we can have a talk here if it is how we want to add it into spec or what needs to be changed.

rmannibucau commented 3 years ago

@Verdent this one https://github.com/eclipse-ee4j/jsonb-api/issues/147#issuecomment-791423062

Verdent commented 3 years ago

@rmannibucau Yes I have to test that, but I would say it will work as expected. I see no reason why it shouldn't with the new version. From my point of view it might make sense to even fix that in the current version, but I will have to go though the code first. Perhaps only line ensureNext(parser, START_OBJECT); would fail since the parser will already be on the desired event. I will test that when I am back home.

rmannibucau commented 3 years ago

@Verdent it is one of the challenge if deserializers, if used at root it must be at start_object but then in nested objects it is not (but it is another issue than this one).

Verdent commented 3 years ago

@rmannibucau But your deserializer is used at root object in this case so parser should be at START_OBJECT upon entering your deserializer. That is why I am mentioning that calling ensureNext with START_OBJECT is probably incorrect there in this particular case.

Anyway with that mentioned polymorphism support, we do support handling of the mentioned {\"type\":1,\"name\":\"first\",\"uno\":true,\"duo\":true} if you make proper configuration. In case of deserialization when custom deserializer is used to implement polymorphism, we do not support this kind of behavior yet.

amoscatelli commented 3 years ago

@Verdent thank you. Please let's just stick with official interfaces, please don't assume the use of any internal implementation. Frankly, I don't think Yasson should include its own JSONP implementation, let's not reinvent the wheel. Even if you have to, do not assume its utilization. Jsonb and JsonP should talk to each other just using the spec interfaces. Imho org.eclipse.yasson.internal.JsonbParser should never been there, if you have a state, it should have been kept in the Yasson DeserializationContext implementation.

DeserializationContext should accept ANY Parser created via JsonParserFactory#createParser(JsonObject).

This alone would be a step further.

Now, if you have in mind something to support Polymorphism please describe it here. I created the first topic about supporting polymorphism and it was three years ago. I believe it's important to bring it to the spec as soon as possible as we don't want to use any implementation specific API.

Johnzon has already its own Polymoprphism support so maybe we can extract a good abstraction from both of them and make it for the next spec release.

As I already stressed out, other specifications are going to use Jsonb also, and they'll need good polymorphism support. jakarta.nosql spec will probably (and the actual implementation actually do so) use jsonb to serialize java objects for the KeyValue abstraction.

Again, thank you.

amoscatelli commented 3 years ago

@Verdent also, is the 2.x branch of Yasson the rework you are talking about ?

Verdent commented 3 years ago

@amoscatelli I am not adding anything like JSON-P implementation. Why would I do that? :-)

Yeah I have removed org.eclipse.yasson.internal.JsonbParser, I could not get my head around why it was there in the first place.

Actually I have already described it here. But I changed it a bit in form of structure now, but the idea is the same. I guess I can make a new full description of how I feel it should look like :-)

No the reworked version will be 3.x. It is not there yet.

m0mus commented 3 years ago

Folks, first of all thanks for your feedback and interest in evolving JSONB spec. It's great and I appreciate it. Let me clarify what's going on. As @Verdent mentioned already, he is working on new version of Yasson. It's a huge rework. The new version has better internal design and it's much faster than the current version. This work is in progress, but he is close to present a draft. When it's ready, he will create a new branch in Yasson project and push his work there, so all of you can review it. The new version has deserialisation support and some other new features such as externalised configuration, etc. We are trying to use implementation first approach. If these features are well accepted by the community, we will add them to the next version of JSONB spec. We plan to do it for Jakarta EE 10.

rmannibucau commented 3 years ago

@Verdent how does a deserializer know it is in an array or object when set on a list then? Thisbis the main issue skipping events, deserializers handle the struct behind the equivalent struct.

Verdent commented 3 years ago

@rmannibucau What I am trying to say is that you are making the check which would fail in your example since parser should be on that event already. I am talking here about this particular JSON "{\"type\":1,\"name\":\"first\",\"uno\":true,\"duo\":true}" and the check in moveToType method ensureNext(parser, START_OBJECT);. You should be already on that first event. No matter what it is since implementation should check the first evet if it is not NULL_VALUE. In case of null value this custom deserializer will not be executed.

But I feel like we are getting out of the scope of this issue here.

rmannibucau commented 3 years ago

@Verdent I understood - and agree we can move it to another issue - but checking null does not mean not serving the full model to the deserializer (pushback logic), other impl lead to broken deserializers.

mkarg commented 3 years ago

Is there already a public proposal how to user's view on polymorphism is like, e. g. what annotations to set etc?

Verdent commented 3 years ago

Not yet. But I am currently making one :-)

mkarg commented 3 years ago

OpenJDK 17 comes with sealed classes, including an extension to the Reflection API which allows to compute a complete hierarchy of possible subclasses at runtime. It would be great if JSON-B follows the "feels-like-java" approach and automatically provides polymorphism for all sealed classes without the need for any additional annotations in that case (CoC). That would be terrific!

rmannibucau commented 3 years ago

-1, polymorphism is about modifying the serialization shape to add a discriminator with wrapping the payload or not. Sealed classes are not only about this use case - most other trivial one is about pattern matching only, no subattribute so no discriminator in terms of shape. I'd stick on the standard solution which enables the user to decide.

t1 commented 3 years ago

+1 Sealed classes lend themselves perfectly for polymorphic (de)serialization.

We'd need to answer a number of questions. How do we derive the discriminator from the class name? Maybe just the simple class name? How can I override it? Maybe with an annotation @PolymorphicTypeName. And what field do we use to discriminate? type is probably a good default; what happens if there is already a type field? How can I specify a different one? When deserializing, how can I ignore, e.g., messages with an unknown type? Catching the exception is probably too slow if this happens often, or extremely cumbersome when deserializing collections or streams. Maybe by returning null or by skipping elements if it's a collection or stream, if a config option failOnUnknownPolymorphicType is set to false? Many more questions will probably pop up.

And I think that users may wonder why it works when serializing some classes but not on others, and there's no error message that could explain this (only the type field is missing). So maybe we should make this explicit with an annotation like @Polymorphic?

If we later choose to add support also for non-sealed classes, we can reuse all the annotations and conventions defined for the sealed classes.

rmannibucau commented 3 years ago

@t1 the opposite will happen, polymorphism is under work, sealed classes will just work as others.

Javaexperter commented 2 years ago

Using the full class name from the (Serializer) JSON is a serious security issue. An attacker could make you deserialize an arbitrary class, and some classes can cause remote code execution. How I can have just the name for the class. like so?

[ { "className": "Square", "shape": { "side": 2.0 } }, { "className": "Circle", "shape": { "radius": 5.0 } } ]

I usesd switch but not working. can someone help me

bmarwell commented 2 years ago

Using the full class name from the (Serializer) JSON is a serious security issue. An attacker could make you deserialize an arbitrary class, and some classes can cause remote code execution. How I can have just the name for the class. like so?

[ { "className": "Square", "shape": { "side": 2.0 } }, { "className": "Circle", "shape": { "radius": 5.0 } } ]

I usesd switch but not working. can someone help me

Please ask on Gitter, stack overflow etc (or open a question or discussion). Hijacking issues with new topics is not the way to go. Thanks.