housepower / clickhouse_sinker

Easily load data from kafka to ClickHouse
https://housepower.github.io/clickhouse_sinker
Apache License 2.0
519 stars 118 forks source link

dynamic schema support #94

Closed yuzhichang closed 3 years ago

yuzhichang commented 3 years ago

There are often KV fields in log lines such as k1=v1 k2=v2 k3=v3. Those keys may differ from line to line. The ETL convert each log line to a JSON and put to a Kafka topic. So clickhouse_sinker need to store all fields of JSON message into Clickhouse table, and there may be new fields and missing fields from time to time. So clickhouse_sinker need to dynamic add fields to ClickHouse table, never delete. The field type can be deduced to one of Nullable(Int64), Nullable(Float64), Nullable(String). It's doable for fastjson parser since Object.Visit() iterates all fields. However gjson doesn't have such API.

https://github.com/ClickHouse/ClickHouse/pull/17829 introduced Map column. The drawback: (1)It's somewhat slow since it's based on array. (2) The SQL users shall be aware of that special column.