The ScyllaDB Sink Connector is a high-speed mechanism for reading records from Kafka and writing to ScyllaDB.
Clone the connector from Github repository and refer this link for quickstart.
The following are required to run the ScyllaDB Sink Connector:
The ScyllaDB Sink Connector accepts two data formats from kafka. They are:
Note: In case of JSON without schema, the table should already be present in the keyspace.
This connector uses the topic name to determine the name of the table to write to. You can change this dynamically by using a transform like Regex Router to change the topic name.
To run this connector you can you a dockerized ScyllaDB instance. Follow this link for use.
You can configure this connector to manage the schema on the ScyllaDB cluster. When altering an existing table the key is ignored. This is to avoid the potential issues around changing a primary key on an existing table. The key schema is used to generate a primary key for the table when it is created. These fields must also be in the value schema. Data written to the table is always read from the value from Apache Kafka. This connector uses the topic to determine the name of the table to write to. This can be changed on the fly by using a transform to change the topic name.
This connector provides support for TTL by which data can be automatically expired after a specific period.
TTL
value is the time to live value for the data. After that particular amount of time, data will be automatically deleted. For example, if the TTL value is set to 100 seconds then data would be automatically deleted after 100 seconds.
To use this feature you have to set scylladb.ttl
config with time(in seconds) for which you want to retain the data. If you don't specify this property then the record will be inserted with default TTL value null, meaning that written data will not expire.
This connector supports two types of offset tracking, but always stores them at least on Kafka.
They will appear in internal __consumer_offsets
topic and can be tracked by checking connector's consumer group
using kafka-consumer-groups
tool.
Offset stored in ScyllaDB Table
This is the default behaviour of the connector. The offsets will be additionally stored in table defined by scylladb.offset.storage.table
property.
Useful when all offsets need to be accessible in Scylla.
Offset stored in Kafka
For offsets to be managed only on Kafka, you must specify scylladb.offset.storage.table.enable=false
.
This will result in less total writes. Recommended option.
This connector has at-least-once semantics. In case of a crash or restart, an INSERT
operation of some rows
might be performed multiple times (at least once). However, INSERT
operations are idempotent in Scylla, meaning
there won't be any duplicate rows in the destination table.
The only time you could see the effect of duplicate INSERT
operations is if your destination table has
Scylla CDC turned on. In the CDC log table you would see duplicate
INSERT
operations as separate CDC log rows.
Refer the following confluent documentation to access kafka related metrics.