sraoss / pg_ivm

IVM (Incremental View Maintenance) implementation as a PostgreSQL extension
Other
994 stars 27 forks source link

Logical Replication and IVM #36

Open koureasstavros opened 2 years ago

koureasstavros commented 2 years ago

Is there any particular reason why this nice extension is not supporting tables which are synced by logical replication? I suppose that has to do with the fact that replication is transferring byte data and not replicating the INSERT/UPDATE/DELETE statements so the triggers took an effect? Something like this?

I think this extension if will work with logical replication it would be the perfect combination for near to real time analytics with lightweight modifications. I have opened also some ideas regarding logical replication into PostgeSQL team which I find that all these components together will bring new audience into PostgreSQL.

yugo-n commented 2 years ago

As you mentioned, pg_ivm is implemented using statement-level BEFORE/AFTER trigger in order, but the logical replication apply process only fires row triggers. This is the reason why current pg_ivm doesn't work with logical replication.

However, I agree with that it is nice if pg_ivm could work with logical replication, so I would like consider it as one of future works. Maybe it is possible by using row-level triggers instead of statement-level.

koureasstavros commented 1 year ago

Thank you @yugo-n for confirming the above. I also want to ask you if you see any problem using the pg_ivm as Datawarehouse, meaning that data from production systems will be replicated into staging tables with Debezium for now, so the Views will be automatically refreshes and queries from front end app will consume view's data. I have asked some providers why the do not include pg_ivm into their managed instances and they responded that this extensions is good to use only for underlying tables which are not refreshed frequently. While using Debezium, data changes will be streamed into those underlying tables. Do you see any problem?

koureasstavros commented 1 year ago

@yugo-n, lately I noticed the extension "PGLogical" which work similar to Native Logical Replication, do you know if the case is the same? Does this PGLogical spills the data using row triggers and again cannot be used with pg_ivm?

rotten commented 1 year ago

OTOH, the immv materialized views look like tables to postgres. Can they they be replicated? (Normally we can't replicate materialized views.)

rotten commented 1 year ago

Another approach might be to use SymmetricDS instead of logical replication for the PG change data capture. I believe it will fire all triggers on the pg_ivm side when it propagates the changes. The main disadvantage is you need to spin up extra containers for the management and control software instead of feeding directly off of the parent databases.


FWIW, being able to do incremental materialized view updates from logical replication sources gives us a "poor man's streaming database" (materialize db, rising wave, apache pinot), at least for postgresql change data capture and real time processing of those specific events.

koureasstavros commented 1 year ago

Hi rotten, thanks for your comment and ideas. Currently I have already deployed another way of transferring data using Debezium which is Open Source. But like you said this requires a few containers and therefore vms for kubernetes which are increasing the cost. The most proper, elegant solution and with minimal cost is only using logical Replication where no other service is required. Changing this only to user pg_ivm is not option for me as logical Replication is also able to handle data sequence. In addition using another software for Datawarehouse is also not proper idea if is not capable to be connected with PowerBI service which is the most common and powerful front end.

koureasstavros commented 1 year ago

Hi @yugo-n, any update on this? Are you considering row triggers as events into pg_ivm to make it work with Logical Replication?

bjne commented 3 months ago

@yugo-n are there any updates you can share?

byronferguson commented 2 months ago

bump

any known alternatives that support logical replication?

yugo-n commented 1 month ago

I am sorry for late response. I am not familiar with Datawarehouse area, but if there is any use cases, it may make sense to support "row trigger mode" which allows to maintain IMMVs whose base tables are updated by logical replication. I've not investigate it deeply, it seems not so difficult work to implement it in an naive way. However, one disadvantage is that the performance of maintenance would be mush worse as the IMMV is updated repeatedly for each row if it is naively implemented. Instead, maybe, it is better to collect rows that are modified in a transaction, and update the view only once per a transaction.