trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.52k stars 3.03k forks source link

Allow custom deserialization for user defined types #24278

Open JulianGoede opened 2 days ago

JulianGoede commented 2 days ago

The trino-client module contains a hardcoded list of "StandardTypes" in ClientStandardTypes which are mapped to TypeDecoders in JsonDecodingUtils#createTypeDecoder. For any other type this createTypeDecoder will use a BASE_64_DECODER fallback and thus user defined types need to provide a base64 encoded representation, but this representation is a non human-readable (understandable) format.

Example: (trino v465) I have a created a plugin with a udt called tsrange which defines the method

import com.fasterxml.jackson.annotation.JsonValue;

...
    @JsonValue
    public String toBase64() {
        return Base64.getEncoder().encodeToString(toString().getBytes());
    }

This will be the tsrange vs varchar representation in the trino-cli

trino> SELECT
    -> tsrange '[2024-01-01 20:00:00,2024-01-02 21:00:00)' as actual_representation,
    -> cast(tsrange '[2024-01-01 20:00:00,2024-01-02 21:00:00)' as varchar) varchar_representation;
              actual_representation              |          varchar_representation           
-------------------------------------------------+-------------------------------------------
 5b 32 30 32 34 2d 30 31 2d 30 31 20 32 30 3a 30 | [2024-01-01 20:00:00,2024-01-02 21:00:00) 
 30 3a 30 30 2c 32 30 32 34 2d 30 31 2d 30 32 20 |                                           
 32 31 3a 30 30 3a 30 30 29                      |                                           
(1 row)

For obvious reasons I would prefer my type to be displayed just like the varchar representation (and without the need to be casted as varchar), however currently there doesn't seem to be a mechanism in trinos SPI that would allow such thing.

Ideally, I think trinos SPI should allow you to create udts along with custom encoder/decoder implementations such that a plugin developer (in this case me) can decide how the type will be displayed.

wendigo commented 2 days ago

Custom deserialization won't be possible since you'd need to modify client code. We don't want to rely on Jackson serialization as well - this won't work in the future when we introduce new encoding formats. Instead, we are thinking about allowing a custom type to "describe itself" using Trino built-in types.

As an example, if we have a GeoType represented as a tuple of (lat, lon) it can be represented on the wire as row(lat, lon) and rendered this way in the CLI/JDBC (and even we can make it possible to access components directly)

JulianGoede commented 2 days ago

Cool, I think this would work well.