itinycheng / flink-connector-clickhouse

Flink SQL connector for ClickHouse. Support ClickHouseCatalog and read/write primary data, maps, arrays to clickhouse.
Apache License 2.0
349 stars 149 forks source link

upsert by updating row if it exists and inserting otherwise #22

Closed wangqinghuan closed 2 years ago

wangqinghuan commented 2 years ago

When execute multiple same insert statements in upsert mode, flink-connector-clickhouse did not update these document, it appended these document to Clickhouse instead.

wangqinghuan commented 2 years ago

We are using Flink CDC to etl data to Clickhouse. When restart Flink CDC job each time, all data from CDC source are treated as INSERT rows, which cause data duplication in Clickhouse. It's resonable if row exist then update else insert.

itinycheng commented 2 years ago

We are using Flink CDC to etl data to Clickhouse. When restart Flink CDC job each time, all data from CDC source are treated as INSERT rows, which cause data duplication in Clickhouse. It's resonable if row exist then update else insert.

OK, I know your needs.

  1. Some table engines like Log, and File don't support update/delete operations.
  2. If your tables use MergeTree as TableEngine, let clickhouse do data compaction by itself or use the final keyword to force data compaction is more reasonable? Such as select * from table final.

If the above isn't enough: We only check for existence and update data when row kind is UPDATE_AFTER, is this OK and meets your needs?