Open hartmut-co-uk opened 3 years ago
In a single query, the connector queries all the streams that belong to a given vnode. That's why the offset is tracked by vnode_id.
Does that answer your question @hartmut-co-uk ?
thanks! How is this topic consumed upon connector (re)start / task/consumer rebalancing? From beginning?
How do 'generation_start' and streams relate? Are there Scylla system topics where all of this is maintained?
@avelanarius Could you please answer with details here?
I've been playing with the TableBackedProgressManager
of scylla-cdc-go and I think it might be a good alternative candidate on how to persist current CDC log stream state (cdc$time)...
Are there plans to add similar functionality to either this repo or scylla-cdc-java?
Hi, when looking at the data published to
connect-offsets
table I noticed the latest window state is tracked byWhy is this at the
vnode_id
level and where does this information come from? When querying the table thevnode_id
is not used as a query condition, right?Further implication (maybe?): The topic
connect-offsets
is created by kafka connect (not the scylla connector) and is not a compacted topic. While running a simple test (scylla.query.time.window.size: 2000
) for 1 connector, 1 task, 1 table - resulted in ~1M messages on thedocker-connect-offsets
topic. @pkgonan may I ask if you've got numbers to confirm this for a more comprehensive setup?@haaawk how is this topic consumed upon connector (re)start / task/consumer rebalancing? From beginning?
Update 2021-12-15:
ℹ️ For reference: the part on
connect-offsets
already has been well described and addressed in a section in the repo README: https://github.com/scylladb/scylla-cdc-source-connector/blob/ecbeb1d3f643a20b6387d1e54b4aa6837f171738/README.md?plain=1#L601-L605