microsoft / kafka-connect-cosmosdb

Kafka Connect connectors for Azure Cosmos DB
MIT License
51 stars 55 forks source link

Doesn't allow writing duplicates in bulk writer and corresponding tests #515

Closed tvaron3 closed 1 year ago

tvaron3 commented 1 year ago

Type of PR

Purpose of PR

The bulk upsert operation will sometimes write data with duplicate id and partition key incorrectly. This change prevents duplicate items to be sent to the bulk upsert operation. It should only send the latest item. The feature is hidden behind a new config that is set to true by default. The flag added is called "connect.cosmos.sink.bulk.compression.enabled".

Observability + Testing

Review notes

Issues Closed or Referenced

TheovanKraay commented 1 year ago

LGTM - nit: can we please add the actual flag in the PR description?

TheovanKraay commented 1 year ago

LGTM - nit: can we please add the actual flag in the PR description?

I saw you updated, but I meant show the value that customer will need to update in their config file for the connector - I think its connect.cosmos.sink.bulk.duplicates.enabled?

tvaron3 commented 1 year ago

Theo I changed it again and the value they would need to change is "connect.cosmos.sink.bulk.compression.enabled".