Support Aggregation pushdown

jopoly commented 1 year ago

Hi Laurenz,

Sorry for creating an issue ticket during vacation.

I would like to share the source code to support Aggregation pushdown, which is below:

Support pushdown GROUP BY (var/expression)
Support pushdown GROUP BY + HAVING
Support pushdown ORDER BY (var/expression)
Support pushdown LIMIT + OFFSET
Support pushdown the basic aggregate functions (sum, count, min, max, avg, stddev, variance)
Aggregates in subquery are pushed down
Grouping Sets (not pushdown)

The whole source code has ~3300 lines of code (add) and ~560 lines of code (delete). Are you interested?

laurenz commented 1 year ago

I am confused. oracle_fdw already supports pushing down LIMIT and OFFSET and ORDER BY.

But I am potentially interested in aggregate pushdown.

However, i am not interested in a single commit that changes 4000 lines. That would be nigh impossible to review. Please keep a pull request to a single functionality and don't modify unrelated things. It can be a good idea to split it into multiple commits for readability.

jopoly commented 1 year ago

I am confused. oracle_fdw already supports pushing down LIMIT and OFFSET and ORDER BY.

I meant we pushdown Aggregation with the above clauses. As my investigation on postgres_fdw's code, we need to follow other code flow. You can refer to postgresGetForeignUpperPaths function. For example, we execute the query SELECT d, count(*) FROM typetest1 GROUP BY d ORDER BY 1; on postgresql Pushed down query to oracle should be:

  Oracle query: SELECT /*84c0105d8f73afbea6de52ca10fae7ab*/ "D", count(*) FROM  "TYPETEST1" GROUP BY "D" ORDER BY "D" ASC NULLS LAST

But I am potentially interested in aggregate pushdown.

Thank you for your interest.

However, i am not interested in a single commit that changes 4000 lines. That would be nigh impossible to review. Please keep a pull request to a single functionality and don't modify unrelated things. It can be a good idea to split it into multiple commits for readability.

Yes, I agree with you. I will try to separate to single commit for each functionality. I'd like to port test cases of postgres_fdw to oracle_fdw, so we don't need to make new test for Aggregation pushdown. Could you please confirm this point?. I will support the tests below. Please let me know if you have any comment.

1, Support aggregation pushdown with the basic aggregate functions. Currently, I can support the following functions: sum, avg, max, min, stddev, count, variance, corr, covar_pop, covar_samp, cume_dist, dense_rank, percent_rank, stddev_pop, stddev_samp, var_pop, var_samp, percentile_cont, percentile_disc,

2, Support aggregation pushdown with grouping (section Aggregate and grouping queries in postgresql_fdw.sql) 2.1, Simple aggregates

explain (costs off)
select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2), sum(c1) * (random() <= 1)::int as sum2 from ft1 where c2 < 5 group by c2 order by 1, 2;
                                                                                                                      QUERY PLAN                                                                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*99914bc7fbec710b637d4aa744a8f83e*/ count("C6"), sum("C 1"), avg("C 1"), min("C2"), max("C 1"), stddev("C2"), "C2" FROM  "T 1" WHERE ("C2" < 5) GROUP BY "C2" ORDER BY count("C6") ASC NULLS LAST, sum("C 1") ASC NULLS LAST
(2 rows)

2.2. Aggregates in subquery are pushed down.

explain (costs off)
select count(x.a), sum(x.a) from (select c2 a, sum(c1) b from ft1 group by c2, sqrt(c1) order by 1, 2) x;
                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate
   ->  Foreign Scan
         Oracle query: SELECT /*dcc391c9580f4de1e9bb9fc28229222d*/ "C2", sum("C 1"), sqrt("C 1") FROM  "T 1" GROUP BY "C2", (sqrt("C 1")) ORDER BY "C2" ASC NULLS LAST, sum("C 1") ASC NULLS LAST
(3 rows)

2.3. GROUP BY clause in various forms, cardinal, alias and constant expression

explain (costs off)
select count(c2) w, c2 x, 5 y, 7.0 z from ft1 group by 2, y, 9.0::int order by 2;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*2eae2d902b6851d128a93d5818be1851*/ count("C2"), "C2", 5, 7.0, 9 FROM  "T 1" GROUP BY "C2", 5, 9 ORDER BY "C2" ASC NULLS LAST
(2 rows)

2.4. GROUP BY clause referring to same column multiple times

explain (costs off)
select c2, c2 from ft1 where c2 > 6 group by 1, 2 order by sum(c1);
                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*39d3eb479f963b88b63e0e00bdb29d3a*/ "C2", "C2", sum("C 1") FROM  "T 1" WHERE ("C2" > 6) GROUP BY "C2", "C2" ORDER BY sum("C 1") ASC NULLS LAST
(2 rows)

2.5. Testing HAVING clause shippability

explain (costs off)
select c2, sum(c1) from ft2 group by c2 having avg(c1) < 500 and sum(c1) < 49800 order by c2;
                                                                                       QUERY PLAN                                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*007f188f5832febc5bc87ccaeae8785f*/ "C2", sum("C 1") FROM  "T 1" GROUP BY "C2" HAVING (avg("C 1") < 500) AND (sum("C 1") < 49800) ORDER BY "C2" ASC NULLS LAST
(2 rows)

2.6. FULL join with IS NULL check in HAVING

explain (costs off)
select avg(t1.c1), sum(t2.c1) from ft4 t1 full join ft5 t2 on (t1.c1 = t2.c1) group by t2.c1 having (avg(t1.c1) is null and sum(t2.c1) < 10) or sum(t2.c1) is null order by 1 nulls last, 2;
                                                                                                                                                             QUERY PLAN                                                                                                                                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan
   Oracle query: SELECT /*adfb9afbf9c47b7f2fc662c311ae69a4*/ avg(r1."C1"), sum(r2."C1"), r2."C1" FROM ( "T 3" r1 FULL JOIN  "T 4" r2 ON (r1."C1" = r2."C1")) GROUP BY r2."C1" HAVING (((avg(r1."C1") IS NULL) AND (sum(r2."C1") < 10)) OR (sum(r2."C1") IS NULL)) ORDER BY avg(r1."C1") ASC NULLS LAST, sum(r2."C1") ASC NULLS LAST
(2 rows)

laurenz commented 1 year ago

I understand. Getting inspiration from postgres_fdw's test cases is a good idea.

As my investigation on postgres_fdw's code, we need to follow other code flow.

If you mean that part of the existing code has to be refactored, that refactoring would for example be a good candidate for a separate patch.

hslightdb commented 1 year ago

is there any plan to support this feature. or any one can submit a patch?

laurenz commented 1 year ago

@hslightdb I have no plans to implement that, but patches are welcome, as long as they are written in a way that I can review.

jopoly commented 1 year ago

@hslightdb @laurenz It's under development now, please wait for my patch.

jopoly commented 1 year ago

@laurenz As we discussed, I need to separate to small patches so you can review it. I'd like to share patches in the following order. You can review them one by one. After you finished reviewing one patch, I will send you the next patch.

Patch 1: Refactor code following PostgreSQL FDW code style.

Refactor ORDER BY pushdown (following PostgreSQL FDW code style)
Refactor some deparse functions (following PostgreSQL FDW code style)
Refactor building oratable for scanning query and fix some small issues regarding new oratable.
Other parts are not changed. The advantages of this refactoring is to help us to add more feature easily (e.g Aggregation pushdown). It is easy for maintainability in the future.

Patch 2: Support Aggregate function pushdown

Some more aggregate functions like AVG, SUM, MIN, MAX will be pushed down. More functions can be added easily if we want (for improvement/enhancement)
Add more test cases

Patch 3: Support GROUP BY pushdown

Add code and more test cases

Patch 4: Support GROUP BY HAVING pushdown

Add code and more test cases

Patch 5: Support Aggregation and ORDER BY pushdown

Add code and more test cases

For LIMIT+OFFSET, it is not pushed down with Aggregation as mentioned in the code. Do you agree with my proposal above?

laurenz commented 1 year ago

That sounds OK. If the patches are completely independent, you can propose them one at a time. If some change makes sense in the light of a later patch, it is helpful to see them all at once.

jopoly commented 1 year ago

@laurenz Thanks for your feedback. I'd like to share all patches in the attachment. aggregation_pushdown.zip

If you have any queries, please let me know.

jopoly commented 1 year ago

I created patches based on the commit 7c65f589edeae1923e804b4315ddc1fa8894b1cf I developed on the following environment.

Centos 7
OracleXE 21c
PostgreSQL 15.0

jopoly commented 1 year ago

I'd like to explain more about the patches. 1) Patch1

Refactor ORDER BY pushdown
Refactor deparse functions
Introduce a new way to create oratable for scanning table
Fix bug that oracle api does not handle binary type in expression
Fix bug that oracle api does not handle comparison with CLOB
Fix bug that oracle api does not handle NULL column without casting to a data type

2) Patch2

Support some basic aggregate functions AVG, SUM, COUNT, MIN, MAX,...
Add more test cases

3) Patch3

GROUP BY clause pushdown
Add more test cases

4) Patch4

HAVING clause pushdown
Add more test cases

5) Patch5

ORDER BY pushdown with Aggregation
Add more test cases

laurenz commented 1 year ago

Thanks. I'll have a look as soon as I get time. Since you say you wrote the patch for PostgreSQL v15: you are aware that the patch has to support at least the supported versions, that is v11 to v16, right?

jopoly commented 1 year ago

Sorry, I have not built and tested on other versions yet. I worked on PostgreSQL 15.0 because it is the latest stable version.

laurenz commented 1 year ago

I have had a brief look at patch 1. It does not apply to current master. Can yo fix that?

I must say that this is so much code that I find it hard to read. You have several things lumped together; according to your description:

refactor ORDER BY pushdown

that confuses me, because a later patch introduces ORDER BY pushdown.
refactor deparse functions
fix three bugs

Perhaps it would be better to have separate patches for that, in particular for the bug fixes (which bugs?)

Then I have a change to understand what is going on.

It would also be great if you didn't modify unrelated things like the Makefile.

jopoly commented 1 year ago

@laurenz

It does not apply to current master. Can yo fix that?

Yes, I'll do that.

that confuses me, because a later patch introduces ORDER BY pushdown.

The later patch introduces ORDER BY pushdown with Aggregation. According to postgres_fdw's code, it extends a simple foreign path for the underlying grouping relation to perform the final sort remotely.

Perhaps it would be better to have separate patches for that, in particular for the bug fixes (which bugs?)

Can I separate Patch1 to 5 smaller patches?

Patch1.1: Refactor ORDER BY pushdown (following postgres_fdw's code style for maintainability)
Patch1.2: Refactor Deparse functions + New oratable for scanning query including simple, join (and aggregation query later patches)
Patch1.3: Fix bug 1
Patch1.4: Fix bug 2
Patch1.5: Fix bug 3

It would also be great if you didn't modify unrelated things like the Makefile.

I will revert it.

jopoly commented 1 year ago

@laurenz PostgreSQL 16beta1 was released. Do you want these patches working on 16beta1?

laurenz commented 1 year ago

There is already beta 2. Yes, the patches should work on all versions between v11 and v17 (current HEAD). I know that that is annoying.

jopoly commented 12 months ago

@laurenz Thanks for your information. I will use PostgreSQL 16beta2

Could you please confirm the separated patches at https://github.com/laurenz/oracle_fdw/issues/569#issuecomment-1628562120? If you do not agree, how small do you need patches?

laurenz commented 12 months ago

Sorry for the late reply. Yes, that is fine. Thanks.

I cannot promise fast work on this. While I can work on oracle_fdw during my working hours, it is non-profit community work, and other things have to take precedence.

I'll do my best to understand and absorb your code, but I won't make any promises. I won't merge anything that I don't thoroughly understand. After all, I'll be the one who has to deal with bugs arising from the code. If I can't cope with it, you may have to fork oracle_fdw.

AmebaBrain commented 1 month ago

@laurenz , @jopoly any updates on this? I'm interested on basic aggregates pushdown: count, sum, min, max, etc..

today was trying to compare row counts between tables and figured out that basic select count(*) looks to transfer 1 for every source row and summing them up on the postgres side.

which makes such basic but important queries almost unusable on even mid size tables.

laurenz commented 1 month ago

I'll close the issue as "inactive".

laurenz / oracle_fdw

Support Aggregation pushdown #569