CDC typing, execution and Events

elcallio commented 3 years ago

Two main aspects:

Add "Event" concept to, at relatively low cost (hopefully), group RawChange:s into events, with easy access preimage, delta etc.
Move execution model to a wholly executor based operation mode

The former is to ensure user does not need complete concept of how to reconstruct events from change rows, and more importantly, introduce the concept of not updating a stream (task) state until at end-or-event. This because when we (later) implement proper persistence of position states and allow to restore them properly, we don't restart in the middle of an event (if we crashed there), but instead re-do the full event.

The latter is god because a.) Threads are lame and b.) It allows for more user control w.r. to concurrency and over-provisioning concurrency. Right now, in the D3 world, we can't really share executors with driver in a useful manner. But the goal should be to be able to run wholly reactor-like, all layers of execution.

Minor changes:

Type encapsulation
Raw bytes access of cells
Consumer overloading

elcallio commented 3 years ago

Seems my eclipse somehow decided to ignore/reset my git line endings settings. Thus several changes here turned into whole-file-replace. Rebased and should be fixed now.

elcallio commented 3 years ago

ping?

elcallio commented 3 years ago

Issues addressed.

avelanarius commented 3 years ago

I have no further objections to first "half" of commits:

999894b Add concept of "frozen" and "atomic" to DataType
d70d3d5 Add more info to ColumnDefinition
7630ddf Make Cell and Field opaque interfaces
1bb880b Add cell accessors to get "meta" cell for deletion
fea5c5a Add "raw bytes" accessor to cells/change
9d70420 Add isDeleted and deprecate getIsDeleted
66cc58e Reduce implementation class visibility
d062eab Reduce implementation type visibility
c256642 Make worker/taskaction execution (almost) fully Executor+future
05dcbab Add opaque wrapper for consumer interfaces
c709557 Simplify OperationType resolution
47afcae Add sequence number and end-of-batch accessors to RawChange

I am thinking about merging this half right now (and continue discussion on the second half).

I have few issues with Event abstraction. First, wouldn't BatchRawChange name better describe it? (and later I show how you still have to think in terms of RawChange when using Event)

Maybe a bit outside of scope of this PR, but I thought about having even higher-level abstraction. This could be built on top of Event (and it would make it much easier to do), but I'm worried it would add too much confusion - 3 ways of consuming: RawChange, Event and this even higher-level way. And there could be some performance problems regarding transforming the output between those 3 representations.

If this is not a problem and/or we go ahead with Event, those are my only objections:

(Maybe) rename Event to BatchRawChange
Update README.md in Printer and project top-level README - those are meant as a introduction to the library (but if the third abstraction is later introduced wouldn't it confuse people a suggested way of consuming was changed 3 times?)
Remove/revamp Event::getPreImage() and Event::getPostImage() - in the current form they are only useful when the batch contains only 1 delta (see below).

My proposed higher-level abstraction would solve two problems which are still not fixed by Event (although made easier by using Event):

Pair delta rows with pre-image and post-image rows. Note that it's not simple - in the "attached example" there was only a one 1 pre-image row (but 4 post-image rows) and you have to match them with proper delta row by looking at primary keys. Event::getPreImage() and Event::getPostImage() don't help much as they only return a single RawChange for entire Event and in this example those images are of different rows - could confuse the user of library and make them miss data in BATCH operations.
Pairing row range delete CDC log rows. Operations such as DELETE FROM ks.t WHERE pk = 1 AND ck > 0 AND ck < 2; generate two CDC rows (in one Event), but you have to do the pairing yourself.

Here's a rough sketch of my proposed higher-level abstraction:

class UserProvidedConsumer implements HigherLevelConsumer { // dummy name
    CompletableFuture<Void> consumeInsert(InsertChange change) {
        change.getPartitonKey();
        change.getCell("column1");

        // 1. Pairing of pre/post-image rows
        change.getPreImage();
        change.hasPostImage();
        change.getPostImage();
    }
    // similar for update, (single row) delete

     // 2. Pairing row-range deletes
     CompletableFuture<Void> consumeRowRangeDelete(RowRangeDeleteChange change) {
         change.getPartitionKey();
         change.getClusteringKeyStart();
         change.getClusteringKeyEnd();
     }
     // similar for partition delete

}

Attached example

For example, consider such a table with CDC enabled with pre and post-images:

CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY (pk, ck)) WITH cdc = {'enabled': true, 'preimage': 'full', 'postimage': true};

Consider such a batch (note: there was a (pk=1, ck=1, v=4) row before):

BEGIN BATCH 
UPDATE ks.t SET v = 11 WHERE pk = 1 AND ck = 1;
UPDATE ks.t SET v = 12 WHERE pk = 1 AND ck = 2;
UPDATE ks.t SET v = 13 WHERE pk = 1 AND ck = 3;
INSERT INTO ks.t(pk, ck, v) VALUES (1, 7, 7);
UPDATE ks.t SET v = 14 WHERE pk = 1 AND ck = 4;
UPDATE ks.t SET v = 15 WHERE pk = 1 AND ck = 5;
DELETE FROM ks.t WHERE pk = 1 AND ck > 4;
APPLY BATCH ;

And this is generated CDC log table:

 cdc$stream_id                      | cdc$time                             | cdc$batch_seq_no | cdc$deleted_v | cdc$end_of_batch | cdc$operation | cdc$ttl | ck   | pk | v
------------------------------------+--------------------------------------+------------------+---------------+------------------+---------------+---------+------+----+------
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                0 |          null |             null |             0 |    null |    1 |  1 |    4
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                1 |          null |             null |             2 |    null |    7 |  1 |    7
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                2 |          null |             null |             1 |    null |    1 |  1 |   11
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                3 |          null |             null |             1 |    null |    2 |  1 |   12
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                4 |          null |             null |             1 |    null |    3 |  1 |   13
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                5 |          null |             null |             1 |    null |    4 |  1 |   14
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                6 |          null |             null |             1 |    null |    5 |  1 |   15
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                7 |          null |             null |             6 |    null |    4 |  1 | null
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                8 |          null |             null |             7 |    null | null |  1 | null
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |                9 |          null |             null |             9 |    null |    1 |  1 |   11
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |               10 |          null |             null |             9 |    null |    2 |  1 |   12
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |               11 |          null |             null |             9 |    null |    3 |  1 |   13
 0xc574000000000000b3501463a8000461 | 3d282330-985d-11eb-b079-2d2b3c882bdf |               12 |          null |             True |             9 |    null |    4 |  1 |   14

/cc @haaawk @elcallio

elcallio commented 3 years ago

Naming aside (I honestly dislike RawChange, since it is imho even less descriptive - a simple Row or ChangeRow would be better - but I am not married to either approach). You are right in that the Event definition is incomplete. The pre-image/post-image should be at least Collections. As for a higher abstraction, it can really just be a post-processing listener adaptor that breaks up an Event (or ChangeSet?) using pk/ck grouping and build operation specific structures. This is a good idea. I like it. But I would suggest three types of groupings: Original RawChange (ChangeRow!), BatchRawChange (RawChangeSet/RawChangeBatch/RawChangeSet?) and the actual Event on top of it, that is actually exactly one "type" of operation (let's treat row delete start/end as a category - i.e. partition operation types by generative association), with opt single pre and post image. Then a user can still subscribe to the last "high" level object type using a single lambda! :-)

elcallio commented 3 years ago

Renamed, changed pre/post image definition. Changed patch order so it all builds up to the mighty batch sequence. ;-)

haaawk commented 3 years ago

It seems that you're doing multiple unrelated things here @elcallio Could you please split the change into smaller PRs that can be easier reviewed?

haaawk commented 3 years ago

@elcallio I think that we should split this PR into smaller PRs. Not only for the sake of easier reviews but also for the sake of keeping the development history clean. That would also mean merging parts of this PR sooner as they are less disputable than other while still discussing the remaining parts. For example switch to executors deserves its own PR. So does introduction of events.

elcallio commented 3 years ago

It seems that you're doing multiple unrelated things here @elcallio Could you please split the change into smaller PRs that can be easier reviewed?

I could, but it would be akin to busy-work. You would instead have X PR:s that all depend on each other and potentially require even more convoluted rebases eventually. While I get your sentiment, I would honestly prefer not to. This is still a fairly fluid library.

haaawk commented 3 years ago

It seems that you're doing multiple unrelated things here @elcallio Could you please split the change into smaller PRs that can be easier reviewed?

I could, but it would be akin to busy-work. You would instead have X PR:s that all depend on each other and potentially require even more convoluted rebases eventually. While I get your sentiment, I would honestly prefer not to. This is still a fairly fluid library.

My biggest concern is that it's borderline impossible to properly review a big PR like this in a reasonable timeframe. The library is not so fluid any more because @avelanarius has made a lot of effort to test it thoroughly. New changes should now go through a rigid review so that we don't waste 3 weeks @avelanarius spent on testing. Unless of course you have time and desire to repeat this 3 weeks long testing cycle @elcallio :)? I do realize this is kind of busy-work but it is what's require. We want this project to keep high Scylla standards and you would be asked to split the change if it was to Scylla, wouldn't you?

If you don't want to do this busy-work then I can suggest that we will do it but it will have to wait until we find time for that. Many commits in this PR are no-brainers which could be just committed. That's why I'd rather got them in and focus on small parts that are more complex.

elcallio commented 3 years ago

What I don't want is waiting for X PR:s to be merged, whist continuously updating+rebasing said X branches over and over again. But sure, I'll split it into three PR:s - interface/opaque/api richness (first section), executors (second) and batch sequence consumer (last).

scylladb / scylla-cdc-java

CDC typing, execution and Events #30

Attached example