Open imor opened 6 months ago
Hey @imor, Im working on a project for which I want to use pg_replicate
with a custom sink. However, my source DB contains a lot of types that are currently unsupported (most notably the numeric
type).
I took a look around the codebase as well as the diesel
codebase (the latter does support all built-in types) to see how we could extend the supported types.
Diesel essentially wraps the raw pgoutput bytes in a PgValue
type, and implements the FromSql
trait to convert the PgValue
raw bytes to Rust types (BigDecimal
in the case of the numeric
postgres type).
Since those trait implementations parse the raw bytes, it seems like most of the FromSql
implementations could be reused (if not outright copied) in your CdcEventConverter::from_tuple_data
to support additional types but I wanted to check in with you to see what you thought of the idea.
Love the project, keep up the good work!
Your ideas are directionally correct. Just want to point out that apart from CdcEventConverter::from_tuple_data
there's also TableRowConverter::get_cell_value
which is called when initial table is copied before CdcEvent
start arriving.
Apart from the conversions themselves you might face another hurdle if you are implementing a custom sink. Methods in the BatchSink
(and the older Sink
) traits hard code the error type as SinkError
which might not work for a custom sink impl outside the pg_replicate
crate. There's an issue for this but haven't found the time to tackle it yet.
Yes I noticed pretty early there were two converters to handle :sweat_smile:
Quick question about this, is there a reason why the input of both converters is different? As in, why are the bytes in CdcEventConverter::from_tuple_data
are parseable with a simple from_utf8
(even the for the numeric
type) while the bytes in TableRowConverter::get_cell_value
require more elaborate decoding? (sorry if this is obvious, pretty new to this low-level postgres stuff).
On the SinkError
issue, I noticed that there is already a fork out there which does implement the error as an associated type (I inspired myself from that fork for my own). Is that something you would consider merging if I opened a PR for it?
is there a reason why the input of both converters is different?
The reason for two separate code paths for decoding tuple values is because they are coming in from two different operations. When we want to replicate a table we want two things: the data in the table as it exists right now and any changes to the table data as they happen in realtime. The TableRowConverter::get_cell_value
code path is for the initial table copy (data as it exists in the table right now) and CdcEventConverter::from_tuple_data
is for the realtime changes to the table data. If you dig into the types of the input for these two operations (BinaryCopyOutRow
and TupleData
) you'd notice that the first one is a binary format and the second one is a text format (although clients can request a binary format but it's less portable as mentioned in the Postgres docs). That's another reason for the difference.
Is that something you would consider merging if I opened a PR for it?
PRs are always welcome to make the code better :)
Support built-in types from the following table: https://www.postgresql.org/docs/current/datatype.html#DATATYPE-TABLE