Open espadolini opened 1 year ago
cc @hugoShaka for the IBM Postgres setup
We have a customer who has specifically requested support for pgoutput
Documentation says that wal2json is available by default on Azure. https://goteleport.com/docs/reference/backends/#postgresql
This is only the case for the single-server: https://learn.microsoft.com/en-us/azure/postgresql/single-server/concepts-logical
The single server is deprecated and will be unavailable starting march 2025: https://learn.microsoft.com/en-us/azure/postgresql/single-server/whats-happening-to-postgresql-single-server
The flexible-server does not provide this extension anymore: https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-extensions
Edit: Thanks Edoardo. Seems like it really works. Sorry for not checking in reality. Thought I could trust the docs :D
I'm not sure if wal2json was ever in that extension list - maybe because logical decoding plugins are loaded on the fly if necessary, and wal2json doesn't have any initialization that needs to run on server start; I just tested an azure postgres 16 (flexible server) and wal2json works as expected.
We're currently relying on positions in a list to parse the wal2json output in the pgbk change feed; this is fragile, and when it breaks it leads to obscure errors (see #29911). We should make the parsing more robust by actually checking name and type of the columns we are getting info about.This was addressed in #31426.~~In addition, after #29965 we will be updating values during KeepAlive even if there's no need to, just to ensure that update events have all the information we need to emit the
OpPut
event. It might be better to change thekv
table to haveREPLICA IDENTITY FULL
, and, as part of the smarter parsing mentioned above, backfill the missing new data from the old/identity data. This has the drawback of increasing the data that ends up in the WAL (and that must thus be at the very least skipped over by anything that reads the WAL), but doesn't unnecessarily overwrite actual table data - this is especially relevant when doing aKeepAlive
on a value that's big enough to be TOASTed, as that's the exact situation that would benefit from not having to overwrite data unnecessarily.~~ This was fixed in #31358.An improvement in ease of setup would come from switching from
wal2json
topgoutput
, which is always available, being part of Postgres - unfortunately, the SQL interface is only usable withpgoutput
with PostgreSQL 15.3+ and 14.8+ (released in May 2023), so we'd need to do a version check (with aSHOW server_version_num
) for that, or support actual replication connections, which require some more effort on the client side but offer significantly better performance overall. It doesn't seem to be useful to add support forwal2json
over replication connections or any additional plugins other thanpgoutput
.We could use a separate connection string for the change feed connection, which would also elegantly allow the user to specify the use of a replication connection rather than the SQL interface, and would allow for using different users to pull the change feed and to execute backend operations (letting administrators specify per-user options differently for the two). This can almost be considered a requirement in certain managed environments (such as IBM Cloud Databases for PostgreSQL) where the user enabled to do replication must or should be different from the user with read-write access to the database.Implemented in #31918