sraoss / pg_ivm

IVM (Incremental View Maintenance) implementation as a PostgreSQL extension
Other
990 stars 27 forks source link

ivm_visible_in_prestate usage #73

Closed yjhjstz closed 1 year ago

yjhjstz commented 1 year ago

The function table_tuple_fetch_row_version is designed within ivm_visible_in_prestate to retrieve a specific version of a tuple. Why not use the snapshot obtained during the IVM_immediate_before phase and directly utilize this snapshot in refresh_matview_datafill?

/* Create a QueryDesc, redirecting output to our tuple receiver */
    queryDesc = CreateQueryDesc(plan, queryString,
                                GetActiveSnapshot(), InvalidSnapshot,
                                dest, NULL, queryEnv ? queryEnv: NULL, 0);

    /* call ExecutorStart to prepare the plan for execution */
    ExecutorStart(queryDesc, 0);

    /* run the plan */
    ExecutorRun(queryDesc, ForwardScanDirection, 0L, true);

    processed = queryDesc->estate->es_processed;

    if (resultTupleDesc)
        *resultTupleDesc = CreateTupleDescCopy(queryDesc->tupDesc);

    /* and clean up */
    ExecutorFinish(queryDesc);
    ExecutorEnd(queryDesc);

    FreeQueryDesc(queryDesc);

GetActiveSnapshot replace as entry->snapshot ?

yugo-n commented 1 year ago

During incremental maintenance, we need the table state before the table is updated. By specifying the snapshot taken in IVM_immediate_before to CreateQueryDesc, we can get the query results in the past state, but this is about the entire query, not about the specific table in a query. What we need is the result of the query in which a specific table's state is past, and AFAIK there is no simple way to do this in the current PostgreSQL.

yjhjstz commented 1 year ago

0001-try-use-snapshot.patch

You can take a try, IVM got update due to snapshot curcid field plus one by executor.

yugo-n commented 1 year ago

I suppose this patch is for the other repository (pgsq-ivm), so I tested this using the code in it. (I could not apply by git command, so applied manually.)

Your patch works for some limited cases where only one table is modified in a statement. However, it doesn't work when a statement modifies multiple tables, or a view contains self-join, which are the exact situations that "pre-update state" of table is required. See the result of regression tests mentioning "self-join" and "simultaneous table changes" in the comments.

Here is an example of self-join view after applying your patch.

-- create a self-join view
CREATE TABLE base_t (i int, v int);
INSERT INTO base_t VALUES (1, 10), (2, 20), (3, 30); 
CREATE INCREMENTAL MATERIALIZED VIEW mv_self(v1, v2) AS SELECT t1.v, t2.v FROM base_t AS t1 JOIN base_t AS t2 ON t1.i = t2.i;

-- insert a new row
INSERT INTO base_t VALUES (4,40);

-- expected results
SELECT t1.v, t2.v FROM base_t AS t1 JOIN base_t AS t2 ON t1.i = t2.i;
 v  | v  
----+----
 10 | 10
 20 | 20
 30 | 30
 40 | 40
(4 rows)

-- actual results
SELECT * FROM mv_self ;
 v1 | v2 
----+----
 10 | 10
 20 | 20
 30 | 30
(3 rows)
yjhjstz commented 1 year ago

yes, you are right . many thanks.