Closed wangqinghuan closed 2 years ago
We are using Flink CDC to etl data to Clickhouse. When restart Flink CDC job each time, all data from CDC source are treated as INSERT rows, which cause data duplication in Clickhouse. It's resonable if row exist then update else insert.
We are using Flink CDC to etl data to Clickhouse. When restart Flink CDC job each time, all data from CDC source are treated as INSERT rows, which cause data duplication in Clickhouse. It's resonable if row exist then update else insert.
OK, I know your needs.
select * from table final
.If the above isn't enough:
We only check for existence and update data when row kind is UPDATE_AFTER
, is this OK and meets your needs?
When execute multiple same insert statements in upsert mode, flink-connector-clickhouse did not update these document, it appended these document to Clickhouse instead.