scylladb / scylla-cdc-source-connector

A Kafka source connector capturing Scylla CDC changes
Apache License 2.0
41 stars 17 forks source link

Kafka Connect Scylla Connector tasks getting deleted from status topic #35

Closed shantanu-sharechat closed 1 month ago

shantanu-sharechat commented 1 year ago

In testing scylla cdc using kafka connect tasks are automatically getting removed leading to no cdc events being streamed even when there are write ops on the table where cdc stream has been put.

Connector Config

{ "name": "cdc-platform-scylla-load-test-3", "config": { "connector.class": "com.scylladb.cdc.debezium.connector.ScyllaConnector", "scylla.user": "***", "auto.create.topics.enable": "true", "scylla.table.names": "scyllacdcloadtest.livestream", "tasks.max": "50", "scylla.cluster.ip.addresses": "172.19.0.103:19042,172.19.0.104:19042", "scylla.password": "***", "key.converter.schemas.enable": "false", "value.converter.schemas.enable": "false", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "scylla.name": "livestream-cdc-load-testing", "key.converter": "org.apache.kafka.connect.json.JsonConverter" } }

Table details - CREATE TABLE scyllacdcloadtest.livestream ( livestream_id text PRIMARY KEY, createreceivetime bigint, createtime bigint, endreceivetime bigint, endtime bigint, status text ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = '' AND compaction = {'class': 'IncrementalCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE';

Image showing write ops on the table where cdc is put.

Screenshot 2023-04-24 at 6 12 53 PM

Image showing cdc kafka topic messages/sec.

Screenshot 2023-04-24 at 6 15 14 PM

List kafka connect tasks returns

curl --location --request GET '100.98.4.123:8083/connectors

[ "cdc-platform-scylla-load-test-3" ]

Also describe kafka connect task return

curl --location --request GET '100.98.1.157:8083/connectors/cdc-platform-scylla-load-test-3

{ "name": "cdc-platform-scylla-load-test-3", "config": { "connector.class": "com.scylladb.cdc.debezium.connector.ScyllaConnector", "scylla.user": "***", "auto.create.topics.enable": "true", "scylla.table.names": "scyllacdcloadtest.livestream", "tasks.max": "50", "scylla.cluster.ip.addresses": "172.19.0.103:19042,172.19.0.104:19042", "scylla.password": "***", "key.converter.schemas.enable": "false", "value.converter.schemas.enable": "false", "name": "cdc-platform-scylla-load-test-3", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "scylla.name": "livestream-cdc-load-testing", "key.converter": "org.apache.kafka.connect.json.JsonConverter" }, "tasks": [], "type": "source" } which shows there are no tasks running.

Also there are null values for task keys in status topic of kafka connect

Screenshot 2023-04-24 at 6 22 23 PM
mykaul commented 1 year ago

@avelanarius - please take a look or assign someone.

doctorg-ml commented 1 year ago

Hello @avelanarius / @mykaul ,

Do you have any update?

avelanarius commented 1 year ago

@shantanu-sharechat Could you share with us the logs of the connector? (especially at the moment it stopped producing new messages)

shantanu-sharechat commented 1 year ago

@avelanarius these are the logs https://pastebin.com/Jaed8es2

Bouncheck commented 1 year ago

Hi, I've tried a lot of different things and recently I've been able to get similar results by doing the following: Starting Kafka cluster and connector (3 worker nodes, 1 scylla node, almost identical table and connector configuration) -> pausing the connector (pausing seems to be necessary here) -> increasing topic partitions significantly -> resuming connector

This led me to a state where connector is running but no tasks are being reported to even exist. Logs were also quite similar. However just restarting the connector seems to fix this immediately. Do you know if by any chance something on Kafka cluster side was modified when connector tasks stopped working? I'll continue to look into this.

shantanu-sharechat commented 1 year ago

@Bouncheck there is this line in the log [cdc-platform-stable-b76f9b5f8-kkhjk] [2023-05-16 04:58:39,921] INFO [Worker clientId=connect-1, groupId=cdc-platform-stable] Tasks [] configs updated (org.apache.kafka.connect.runtime.distributed.DistributedHerder) this is 120th line on the logs shared in earlier thread.

After which the tasks start getting stopped. We need to debug what leads to this. Also the kafka version - 2.8 Connect docker file

`FROM confluentinc/cp-kafka-connect

ENV CONNECT_PLUGIN_PATH="/usr/share/java,/usr/share/confluent-hub-components"

RUN confluent-hub install --no-prompt scylladb/scylla-cdc-source-connector:1.0.3

WORKDIR /home

EXPOSE 8083 EXPOSE 8081

CMD ["/etc/confluent/docker/run"]`

roydahan commented 1 month ago

Closing, AFAIU from https://github.com/scylladb/scylla-enterprise/issues/2906 the issue doesn't reproduce anymore.