tchiotludo / akhq

Kafka GUI for Apache Kafka to manage topics, topics data, consumers group, schema registry, connect and more...
https://akhq.io/
Apache License 2.0
3.42k stars 661 forks source link

Logical type values are not displayed properly on deserialization when using AWS Glue as schema registry #1907

Open NoahStolk opened 3 months ago

NoahStolk commented 3 months ago

We have the following schema:

{
    "name": "LogicalTypeTest",
    "type": "record",
    "fields": [
        {
            "name": "DateOnly",
            "type": {
                "type": "int",
                "logicalType": "date"
            }
        },
        {
            "name": "DateTime",
            "type": {
                "type": "long",
                "logicalType": "timestamp-millis"
            }
        },
        {
            "name": "Decimal",
            "type": {
                "type": "bytes",
                "logicalType": "decimal",
                "precision": 9,
                "scale": 3
            }
        },
        {
            "name": "TimeOnly",
            "type": {
                "type": "int",
                "logicalType": "time-millis"
            }
        }
    ]
}

When producing the following data:

AKHQ displays the message as follows:

{
  "DateOnly": 19948,
  "DateTime": 1723542120000,
  "Decimal": "\u0001â@",
  "TimeOnly": 42120000
}

image

It appears that the raw underlying data is being displayed when using AWS Glue as schema registry (in this case int, long, and bytes). We previously used Confluent with the exact same schema and all data showed up fine. I'd expect the data to show up as follows (or in a similar format):

{
  "DateOnly": "2024-08-13",
  "DateTime": "2024-08-13 9:42:00",
  "Decimal": 123.456,
  "TimeOnly": "11:42"
}

We are using AKHQ version 0.25.0.

Additional info

The example message is serialized as follows (hex):

03 00 44 82 60 1F BE 56 45 93 B9 97 94 D8 A2 C0 9B 6E D8 B7 02 80 99 ED B1 A9 64 06 01 E2 40 80 CD 95 28

Translated:

| Field          | Bytes                                           | Data type                      | Translated                                                    |
|----------------|-------------------------------------------------|--------------------------------|---------------------------------------------------------------|
| Version        | 03                                              | -                              | 3                                                             |
| Compression    | 00                                              | -                              | No compression                                                |
| Glue schema ID | 44 82 60 1F BE 56 45 93 B9 97 94 D8 A2 C0 9B 6E | uuid                           | 4482601f-be56-4593-b997-94d8a2c09b6e                          |
| DateOnly       | D8 B7 02                                        | int (zigzag, variable length)  | 19948                                                         |
| DateTime       | 80 99 ED B1 A9 64                               | long (zigzag, variable length) | 1723542120000                                                 |
| Decimal        | 06 01 E2 40                                     | bytes                          | Byte array of 3 (01 E2 40) which contents translate to 123456 |
| TimeOnly       | 80 CD 95 28                                     | int (zigzag, variable length)  | 42120000                                                      |

All the data is serialized according to the implementation provided by aws-glue-schema-registry, as well as the official Avro specification.

Furthermore, conduktor.io (an alternative tool) is able to deserialize and display this data correctly. This leads me to believe our message is correct, and all the data is simply being displayed as its underlying type instead of the logical types.

NoahStolk commented 3 months ago

Hi @arindampatra33, sorry for the ping. I'm seeing your contributions to include support for AWS Glue (thanks btw!) and I was wondering if you were running into these issues as well, or if there is something I'm doing wrong here. I'm curious if this is happening for anyone else that's using Glue.

arindampatra33 commented 3 months ago

@NoahStolk Sorry but I didnt get a chance to test with logical types , it could be a bug with the aws avro deserializer , Could you create a kafka consumer app and deserialize data to generic record and see ?

NoahStolk commented 3 months ago

@arindampatra33 We're using .NET, so we can't use the AWS Avro deserializer because it doesn't provide a .NET implementation ( multi-lang support issue is still open: https://github.com/awslabs/aws-glue-schema-registry/issues/43 ).

However, I'm certain our (de)serializers work correctly. I've analyzed the binary output of various messages and they're all according to the official Avro specification. We've also done some tests where we produce serialized messages to Kafka and deserialize them in our consumers. I've tested all data types and all the data stays intact when it reaches the consumer. The only thing that doesn't work is the visualization of data using logical types in AKHQ.

NoahStolk commented 3 months ago

We've tried the same tests in conduktor.io. All logical types show up correctly on deserialization in Conduktor. I think it's safe to say that this confirms our implementations are correct and that this is indeed a bug in AKHQ.

arindampatra33 commented 3 months ago

I will look into this in 2 weeks

On Tue, 20 Aug, 2024, 13:49 Noah Stolk, @.***> wrote:

We've tried the same tests in conduktor.io https://www.conduktor.io/. All logical types show up correctly on deserialization in Conduktor. I think it's safe to say that this confirms our implementations are correct and that this is indeed a bug in AKHQ.

— Reply to this email directly, view it on GitHub https://github.com/tchiotludo/akhq/issues/1907#issuecomment-2298257581, or unsubscribe https://github.com/notifications/unsubscribe-auth/APRXY3WWDWAN5BHUMZZIUV3ZSL3YPAVCNFSM6AAAAABMN7WN2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJYGI2TONJYGE . You are receiving this because you were mentioned.Message ID: @.***>