Open Locustv2 opened 2 years ago
https://github.com/wal-g/wal-g is for postres.
Speaking of postgres, a pg logical slot input would be fantastic.
Here is conduit's vitess/mysql connector - being a golang tool that is very similar to benthos, it may provide some good inspiration for this.
MySQL is a very common reliable RDS. However listening to events as CDC (change data capture) from MySQL has still rooms to grow as there are not many solutions readily available.
In order to create an add-on for benthos, we need to understand how to handle the different events from MySQL:
To use binlog from MySQL, it has to be enabled on the server side. Once enabled, the
mysqlbinlog
can read the binlog files which looks as follows:Which is basically the historical logs of queries that can be executed on a new database to restore the state.
Idea:
With the binlog file, we have several options to move forward
However regardless of the solution we choose, we will still need to come up with an output that will handle the 3 CDC events.
If we take the common use case of mysql-kafka, there are only 2 events:
So we should be able to know at all time within a benthos pipeline what the
key
of the current message/data is.The reason for this is that if we don't have it, we need to handle it separately. Example CSV to kafka benthos:
In this example you can see that i had to handle the
key
and tombstone in the bloblang. However if we have always 2 object to work with (i.e. 1 for key and 1 for value) this can be much simpler. (of course there is no way to determine a delete from a csv file, this is purely based on the type of input that actually supports delete)The MySQL Binlog perspective
The new input add-on should handle the 3 events mentioned above. Let's assume we can have a
key
andvalue
of the message in each step of the pipeline of a benthos app. The add-on could process the events as follows (taking a basic person object with basic fields as example):INSERT INTO PERSON (person_id, name, address) VALUES (100, "Tom", "Some Street")
Example of a benthos pipeline:
UPDATE
- any update could be treated the same way as theINSERT
above. However another base attribute can be added for the old values. so we would then havekey
,value
andold_value
.DELETE
- for deletes, we would still havekey
andvalue
. However the value will be set tonull
. This can be challenged in a way that we provide an object with all attributes set to null, but i highly doubt this would make much sense.Examples of binlog json parsers:
I'll add more updates later. If. you have more ideas or questions, feel free to add.