There are often KV fields in log lines such as k1=v1 k2=v2 k3=v3. Those keys may differ from line to line.
The ETL convert each log line to a JSON and put to a Kafka topic. So clickhouse_sinker need to store all fields of JSON message into Clickhouse table, and there may be new fields and missing fields from time to time.
So clickhouse_sinker need to dynamic add fields to ClickHouse table, never delete. The field type can be deduced to one of Nullable(Int64), Nullable(Float64), Nullable(String).
It's doable for fastjson parser since Object.Visit() iterates all fields. However gjson doesn't have such API.
There are often KV fields in log lines such as
k1=v1 k2=v2 k3=v3
. Those keys may differ from line to line. The ETL convert each log line to a JSON and put to a Kafka topic. So clickhouse_sinker need to store all fields of JSON message into Clickhouse table, and there may be new fields and missing fields from time to time. So clickhouse_sinker need to dynamic add fields to ClickHouse table, never delete. The field type can be deduced to one ofNullable(Int64)
,Nullable(Float64)
,Nullable(String)
. It's doable for fastjson parser since Object.Visit() iterates all fields. However gjson doesn't have such API.https://github.com/ClickHouse/ClickHouse/pull/17829 introduced
Map
column. The drawback: (1)It's somewhat slow since it's based on array. (2) The SQL users shall be aware of that special column.