Closed tollercode closed 4 months ago
It may take some time to reach a consensus on whether it is the desired behavior for RisingWave.
Just as a reference:
KSQL creates automatically this table schemas using this provided Avro Schema:
{
"namespace": "com.***.***.mqtt",
"name": "als.DataMessage",
"type": "record",
"fields": [
{
"name": "metrics",
"type": {
"type": "array",
"items": {
"name": "als_data_metric",
"type": "record",
"fields": [
{
"name": "id",
"type": "string",
},
{
"name": "name",
"type": "string",
},
{
"name": "norm_name",
"type": [
"null",
"string"
],
"default": null,
},
{
"name": "uom",
"type": [
"null",
"string"
],
"default": null,
},
{
"name": "data",
"type": {
"type": "array",
"items": {
"name": "dataItem",
"type": "record",
"fields": [
{
"name": "ts",
"type": "string",
"doc": "Timestamp of the metric."
},
{
"name": "value",
"type": [
"null",
"boolean",
"double",
"string"
],
"doc": "Value of the metric."
}
]
}
},
"doc": "The data message"
}
],
"doc": "A metric object"
}
},
"doc": "A list of metrics."
}
]
}
KSQL Table
METRICS | ARRAY<STRUCT<ID VARCHAR(STRING), NAME VARCHAR(STRING), NORM_NAME VARCHAR(STRING), UOM VARCHAR(STRING), DATA ARRAY<STRUCT<TS VARCHAR(STRING), VALUE STRUCT<BOOLEAN BOOLEAN, DOUBLE DOUBLE, STRING VARCHAR(STRING)>>>>>
Any updates on this? This is the only reason blocking us from migrating away from KSQL
Also, IIUC, union
with null
in Avro schema basically means optional
(i.e. nullable
). For example,
union[null, string]
-> RW: varchar
rather than structunion[null, string, long]
-> RW: struct<string, long>
Additionally, as a result, union
might be more frequently used in Avro compared with Protobuf's oneof
.
as a result, union might be more frequently used in Avro compared with Protobuf's oneof.
This might be wrong. Post the comments from @xiangjinwu here:
Disagree with this. It was controversial (2015, 2018) until confluent uses it to represent multiple schemas in a single topic in 2020, which is then added in ksql in 2022 (2 years later). My experience here may be limited though.
It is already mid-2024 and I am not against supporting it. But the design space is still quite open compared to other more common data types. I will try to list down the details I am concerning before the end of next week.
Similar to map
#13387 we considered several ways to support it:
union
data typejsonb
struct
with one field for each nonnull member, as proposed abovestruct
with 1+N
fields, with an additional explicit tagHowever, the interface and semantic of a native union
is not as universal as map
across databases or programming languages. To avoid committing to a premature design, we will not do it right now. Out of the workarounds:
jsonb
is weak in typing and has no advantage of O(logn)
lookup as in map
workaroundstruct
without tag field will be treated as a struct
regarding input / output / ordering / castingstruct
with tag field can get union-specific treatment but that makes it an awkward non-native union-like typeSo we will follow the original struct
design without tag for the initial version. To elaborate on its abilities and restrictions:
IS NOT NULL
, thanks to the fact that avro does not allow inner member to be active but still equals null. (This does not hold for protobuf oneof.)row(null, null, variant_c, null)
Is your feature request related to a problem? Please describe.
When decoding messages, that use a union type within the Avro Schema, RW fails to decode these as currently only 1 Type is supported per field. This requires, that Schemas need to get simplified, e.g. use only string types, which decreases the ability to use strong schemas.
Describe the solution you'd like
ksql introduced an ability to support union types inside avro schemas by creating a 'struct' for this field, that can hold potentially all different types. E.g. a union type of [null, boolean, double, string] becomes --> STRUCT<BOOLEAN BOOLEAN, DOUBLE DOUBLE, STRING VARCHAR(STRING)>
In addition a wildcard struct operator was introduced, to access the struct without knowing the exact field sinside
See: https://www.confluent.io/blog/announcing-ksqldb-0-27-1/#multi-schema-protobuf-avro-topics
This solution would also work for 'oneOf' types in Protobuf and JSON schemas.
Describe alternatives you've considered
A possible alternative is, to cast union types simply into 'strings'. Probably easier to implement, but this will again loosen the strong typing approach.
Additional context
No response