msgpack / msgpack-java

MessagePack serializer implementation for Java / msgpack.org[Java]
http://msgpack.org/
Apache License 2.0
1.4k stars 320 forks source link

Make the MessagePackParser#type field public #800

Open ArtDu opened 5 months ago

ArtDu commented 5 months ago

It's useful when you need to distinguishe bytes array from extension type

Closes #799

komamitsu commented 5 months ago

@ArtDu Can you elaborate your use case?

ArtDu commented 4 months ago

@komamitsu

@ArtDu Can you elaborate your use case?

We use extensions values and byte arrays as well to store data in Tarantool(https://en.wikipedia.org/wiki/Tarantool). And we can distinguishe those two types in jackson embedded object only by class type and it's no elegant way

        private boolean isExtensionValue(Object object) {
            return object instanceof MessagePackExtensionType;
        }

        @Override
        public Object deserialize(JsonParser p, DeserializationContext ctxt) throws IOException {
            if (p.currentTokenId() != JsonTokenId.ID_EMBEDDED_OBJECT || !isExtensionValue(p.getEmbeddedObject())) {
                return super.deserialize(p, ctxt);
            }
            MessagePackExtensionType ext = p.readValueAs(MessagePackExtensionType.class);
            byte type = ext.getType();
            ...
        }

Instead of it we can use type variable that is stored in MessagePackParser

 @Override
        public Object deserialize(JsonParser p, DeserializationContext ctxt) throws IOException {
            MessagePackParser mpParser = (MessagePackParser)p;
            if (mpParser.getType().equals(Type.EXT)) {
                return super.deserialize(p, ctxt);
            }
            ...
        }
komamitsu commented 4 months ago

@ArtDu Thanks for sharing the details.

jackson-dataformat-msgpack has a function to deserialize extention type values by registering the target ext-type ID and callback lambda using org.msgpack.jackson.dataformat.ExtensionTypeCustomDeserializers#addCustomDeser. Does it work for your use case?

BTW, as you use jackson-databind interface, I guess you serialize Java objects to MessagePack format data and/or deserialize MessagePack format data to Java object vise verse. So, I think you know all the type of values when deserializing MessagePack format data since you know how the data is serialized. For instance, let's say you serialize an instance of this class

public static class Foo {
  String name;
  byte[] data;
}

and then, you know the type of data is stored as a byte array and don't need to consider extension types in this case. What do you think?

ArtDu commented 4 months ago

@komamitsu

jackson-dataformat-msgpack has a function to deserialize extention type values by registering the target ext-type ID and callback lambda using org.msgpack.jackson.dataformat.ExtensionTypeCustomDeserializers#addCustomDeser. Does it work for your use case?

In some cases it works for me, but I also need a serializer. That's why I use ObjectMapper(...).registerModule() method.

Also addCustomDeser way can't set targetType. Sometimes I need to use a mapper in this way mapper.readValue(bytes, SpecialClass.class). For example, we can interpret ExtensionValue (Datetime) as ZonedDateTime, as Instant, or as LocalTime. In this way, we determine how to get the needed type.

BTW, as you use jackson-databind interface, I guess you serialize Java objects to MessagePack format data and/or deserialize MessagePack format data to Java object vise verse. So, I think you know all the type of values when deserializing MessagePack format data since you know how the data is serialized. For instance, let's say you serialize an instance of this class

public static class Foo {
  String name;
  byte[] data;
}

and then, you know the type of data is stored as a byte array and don't need to consider extension types in this case. What do you think?

We have two types of interaction:

  1. When we expect an explicit class that we'll read from the database (msgpack bytes). It's like I described above
  2. When we don't know what type we'll come and we need to read bytes by its msgpack own type. The type could be an extension or raw bytes (And at this point I want to determine by the Type)
komamitsu commented 4 months ago

@ArtDu I think understand your use case. Honestly speaking, the usage seems a bit too dynamic in terms of type-safety and to not mach jackson-databind interface. I don't think it's a common use case, and I'm hesitating to approve the change.

As for org.msgpack.jackson.dataformat.MessagePackParser#type, it's not supposed to be externally exposed and I might change it's behavior in the future. If it's exposed by this PR, it would be a possible maintenance concern.

Regarding the sample code you shared, I think you know the value is an embedded object at least in deserialize() and you can simplify it w/o checking currentTokenId like this.

        @Override
        public Object deserialize(JsonParser p, DeserializationContext ctxt) throws IOException {
            // At this point, the current value must be an embedded object, right?
            Object embedded = p.getEmbeddedObject();
            if (!(embedded instanceof MessagePackExtensionType)) {
                // This should be `byte[]`.
                handleByteArrayAsXxxxx(embedded);
                return;
            }
            MessagePackExtensionType ext = (MessagePackExtensionType) embedded;
            byte type = ext.getType();
            ...
        }