Open Lorak-mmk opened 2 years ago
Now that I think of it, maybe it's redundant to have "REMOVE" mode and removals should be represented as setting to null (as is currently the case for UDT)? Then we would have 2 modes, let's say "UPDATE" and "OVERWRITE", the difference between them would be whether the collection is cleared before operation. @avelanarius
Now that I think of it, maybe it's redundant to have "REMOVE" mode and removals should be represented as setting to null (as is currently the case for UDT)? Then we would have 2 modes, let's say "UPDATE" and "OVERWRITE", the difference between them would be whether the collection is cleared before operation. @avelanarius
I don't see how it would work for sets?
Opinion:
Hi, for me it would be great if we could also have to option (configurable?) to just emit FROZEN collections 'as-is' (...always the full latest value).
=> so without the extra ELEMENTS_VALUE
; REMOVED_ELEMENTS_VALUE
; MODE_VALUE
.
That would make the output record look cleaner and more like if you'd query Scylla directly.
I pushed new version, with a bit different representation.
It had to be changed, because previous one didn't work well with queries performing more than one modification on given collection, e.g.: UPDATE ks.t_list SET v = v - [6, 7], v = v + [4, 5] WHERE pk = 1;
Now, there are only 2 modes: OVERWRITE
and MODIFY
, and collection struct always has 2 fields: mode
and elements
.
For list/maps, elements
is a map, element is added/overwritten if value is not null, removed otherwise.
For sets, elements
is a map, with boolean
value - true
means value was added to set, false
means it was removed.
UDTs didn't change.
I also renamed SIMPLE
mode to DELTA
, to better reflect what it actually is.
@avelanarius @haaawk
Opinion: Hi, for me it would be great if we could also have to option (configurable?) to just emit FROZEN collections 'as-is' (...always the full latest value). => so without the extra
ELEMENTS_VALUE
;REMOVED_ELEMENTS_VALUE
;MODE_VALUE
.That would make the output record look cleaner and more like if you'd query Scylla directly.
Yes, that would of course be better, but is harder (as it requires preimage/postimage usage), and will be added in the future - that's why I added config option to select mode for non-frozen collections.
Based on https://github.com/scylladb/scylla-cdc-source-connector/pull/12 and includes changes from it - so review should be done on per-commit base.
This PR adds simple support for non-frozen collections.
There is new config option
scylla.collections.mode
, currently only possible value is "simple" - it selects format for non-frozen collections (in the future we could add preimage etc).Simple mode collections format is described in README.md (along with frozen collections format). Just to give very brief description: non-frozen collections are represented as structs with 2 fields, "mode" and "elements", "mode" marks type of operation (add elements, remove elements, overwrite collection), "elements" are actual elements used in operation. For
Set
, "elements" is simply aSet
.List
is a map withtimeuuid
key type. When removing elements, values are null.Map
is simply aMap
. When removing elements, values are null.UDT
is the most complicated. It is represented as struct, but each field is aCell
, and semantics are the same as with column's "Cell" - null means no change, non-null with nullvalue
field means removal, non-null with non-nullvalue
field means new value.I didn't yet test it with Avro.