scylladb / scylla-cdc-source-connector

A Kafka source connector capturing Scylla CDC changes
Apache License 2.0
41 stars 17 forks source link

Could an event be lost if the tasks.max setting conditions written in the README.md are not met? #6

Closed pkgonan closed 3 years ago

pkgonan commented 3 years ago

Hi Could an event be lost if the tasks.max setting conditions written in the README.md are not met?

In general, the tasks.max property should be greater or equal the number of nodes in Kafka Connect cluster, to allow the connector to start on each node. tasks.max property should also be greater or equal the number of nodes in your Scylla cluster.

[Infra]

Kafka Connect Cluster Node Size : 3
Total Multi DC ScyllaDB Cluster Node Size : 12
Each ScyllaDB Cluster Node Size : 6 (DC : AWS EU_CENTRAL_1), 6 (DC : AWS US_WEST_1)

[Scylla DB Source Connector's configuration]

"connector.class": "com.scylladb.cdc.debezium.connector.ScyllaConnector",
"tasks.max": "3",
"scylla.cluster.ip.addresses": "'${database_url}'",
"scylla.user": "'${database_user}'",
"scylla.password": "'${database_password}'",
"scylla.name": "cdc-data.test",
"scylla.table.names": "'${table_include_list}'",
"scylla.query.time.window.size": "5000",
"scylla.confidence.window.size": "5000",
"producer.override.acks": "-1",
"producer.override.max.in.flight.requests.per.connection": "1",
"producer.override.compression.type": "snappy",
"producer.override.linger.ms": "50",
"producer.override.batch.size": "327680",
"errors.tolerance": "all",
"errors.log.enable": "true",
"errors.log.include.messages": "true",
"topic.creation.default.replication.factor": "'${replication_factor}'",
"topic.creation.default.partitions": "11"
avelanarius commented 3 years ago

No, the tasks.max only affects parallelism (so the connector could be scaled to many Kafka Connect nodes), but it will work correctly regardless of this setting (it could be tasks.max=1).

pkgonan commented 3 years ago

@avelanarius Thanks!