In the previous implementation, multiplicity of each tuple was stored
in ivm_count column in views. When SELECT was issued for views with
duplicate, the view was replaced with a subquery in which each tuple
was joined with generate_series function in order to output tuples
of the number of ivm_count.
This was problematic for following reasons:
The overhead was huge. When almost of tuples in a view were selected,
it took much longer time than the original query. This lost the meaning
of materialized views.
Optimizer could not estimate row numbers correctly because this had to
know ivm_count values stored in tuples.
System columns of materialized views like cmin, xmin, xmax could not
be used because a view was replaced with a subquery.
To resolve this, the new implementation doen't store multiplicities
for views with tuple duplicates, and doesn't use generate_series
when SELECT query is issued for such views.
Note that we still have to use ivm_count for supporting DISTINCT and
aggregates.
Design:
Alothough views doesn't have ivm_count, multiplicities for old delta
and new delta are calculated and the count is contained in each table.
The old delta is applied using ctid and row_number function, like:
DELETE FROM matviewname WHERE ctid IN (
SELECT tid FROM (
SELECT row_number() over (partition by c1, c2, ...) AS ivm_row_number,
mv.ctid AS tid,
diff.ivm_count__
FROM matviewname AS mv, old_delta AS diff "
WHERE mv.c1 = diff.c1 AND mv.c2 = diff.c2 AND ... ) v
WHERE v.ivm_row_number <= v.ivm_count__
The new delta is applied using generate_seriese, like:
INSERT INTO matviewname (c1, c2, ...)
SELECT c1,c2,... FROM (
SELECT diff.*, generate_series(1, diff.__ivm_count__)
FROM new_delta AS diff) AS v
In the previous implementation, multiplicity of each tuple was stored in ivm_count column in views. When SELECT was issued for views with duplicate, the view was replaced with a subquery in which each tuple was joined with generate_series function in order to output tuples of the number of ivm_count.
This was problematic for following reasons:
The overhead was huge. When almost of tuples in a view were selected, it took much longer time than the original query. This lost the meaning of materialized views.
Optimizer could not estimate row numbers correctly because this had to know ivm_count values stored in tuples.
System columns of materialized views like cmin, xmin, xmax could not be used because a view was replaced with a subquery.
To resolve this, the new implementation doen't store multiplicities for views with tuple duplicates, and doesn't use generate_series when SELECT query is issued for such views.
Note that we still have to use ivm_count for supporting DISTINCT and aggregates.
Design:
Alothough views doesn't have ivm_count, multiplicities for old delta and new delta are calculated and the count is contained in each table.
The old delta is applied using ctid and row_number function, like:
DELETE FROM matviewname WHERE ctid IN ( SELECT tid FROM ( SELECT row_number() over (partition by c1, c2, ...) AS ivm_row_number, mv.ctid AS tid, diff.ivm_count__ FROM matviewname AS mv, old_delta AS diff " WHERE mv.c1 = diff.c1 AND mv.c2 = diff.c2 AND ... ) v WHERE v.ivm_row_number <= v.ivm_count__
The new delta is applied using generate_seriese, like:
INSERT INTO matviewname (c1, c2, ...) SELECT c1,c2,... FROM ( SELECT diff.*, generate_series(1, diff.__ivm_count__) FROM new_delta AS diff) AS v
Github Issue #69