pgsql-io / multicorn2

http://multicorn2.org
Other
83 stars 17 forks source link

AlchemyFDW: Clause pushdown not working with OR boolop #8

Open LanceRadioactive opened 2 years ago

LanceRadioactive commented 2 years ago

Another issue in AlchemyFDW: using OR in Where clauses involving foreign tables causes Multicorn to respond with the following warning:

unsupported expression for extractClauseFrom along with refusing to perform pushdown and instead performing a scan over the entire table.

It seems the error happens explicitly with the boolop OR regardless of any operators under it. AND and other boolean operators are supported and pushdown as normal.

luss commented 2 years ago

Can you provide a simple test that shows it on a simple table? I think I know where in the code this is likely happening.

LanceRadioactive commented 2 years ago

Apologies for the late response again! Here are the reproduction steps for this:

  1. Refer to #7 for the basic setup; create the second database, install Multicorn on the first one.
  2. At postgres_physical:
    create table pushdown_example (id1 int, id2 int);
    insert into pushdown_example values (0, 0)
  3. At postgres:
    
    create foreign table pushdown_test
    (id1 int, id2 int)
    server alchemy_srv
    options (db_url 'postgresql://postgres:postgres@localhost/postgres_physical', schema 'public', tablename 'pushdown_example')

explain verbose select * from pushdown_test pt where pt.id1 = 0 or pt.id2 = 0


creates the following output as well as 4 `unsupported expression for extractClauseFrom` warnings.

![изображение](https://user-images.githubusercontent.com/39345560/169086188-1db733ab-3356-4e15-9316-c1f192a98c27.png)

Removing `pt.id2 = 0` or replacing OR with AND proves it otherwise works properly:
![изображение](https://user-images.githubusercontent.com/39345560/169086416-f9ccc1c4-654a-491d-9198-b9d086fb0304.png)
luss commented 2 years ago

I added a blurb to the projects main README saying pg14 support is still experimental.

ShaheedHaque commented 4 months ago

I think I see the same thing. @luss, I presume the code at https://github.com/pgsql-io/multicorn2/blob/2c3d0c00caf4aa72fccb28f1924e06bfbffe1af4/src/query.c#L331 needs a case T_BoolOpExpr or similar?

luss commented 4 months ago

I think you are very likely correct

On Sat, May 11, 2024 at 2:17 AM ShaheedHaque @.***> wrote:

I think I see the same thing. @luss https://github.com/luss, I presume the code at https://github.com/pgsql-io/multicorn2/blob/2c3d0c00caf4aa72fccb28f1924e06bfbffe1af4/src/query.c#L331 needs a case T_BoolOpExpr or similar?

— Reply to this email directly, view it on GitHub https://github.com/pgsql-io/multicorn2/issues/8#issuecomment-2105588894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHWGCZAWVJT3XS5CYZDZBWZYXAVCNFSM5VY4EV32U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJQGU2TQOBYHE2A . You are receiving this because you were mentioned.Message ID: @.***>

ShaheedHaque commented 4 months ago

So, here is what I have deduced:

  1. As per the OP, and some tests I performed, a T_BoolOpExpr occurs when the where clause qualifier has the form of an OR, or AND NOT (and, I suspect, anything other than an AND) [1].
  2. The analogous code in Postgres' contrib.postgres_fdw implmentation has - not surprisingly - a recursive expansion of the qualifier where we have code for 3 specific cases (equality, test for NULL and IN).

Now, even putting aside writing code to cover this case, the question is whether Multicorn should aspire, in principle, to support a wider set of qualifier expressions, starting with a recursive walk in the T_BoolOpExpr case?

The alternative is to say that (as I understand it) Postgres does not assume pushdown, and must therefore do its own qualifier checking, and few - if any - plausible use cases for Multicorn would need such a general approach, and so, simply make unwanted qualifiers a no-op and silence the messages while gradually adding other wanted qualifiers [2].

[1] Example of BOOLEXPR query:

('SELECT "jobs"."queue" FROM "jobs" WHERE ("jobs"."queue" = %s OR "jobs"."state" = %s)', (param1, param2))

which emits:

WARNING:  unsupported expression for extractClauseFrom
DETAIL:  {
    BOOLEXPR :boolop or :args ({
        OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ({
            RELABELTYPE :arg {
                VAR :varno 1 :varattno 2 :vartype 1043 :vartypmod 44 :varcollid 100 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 41}
               :resulttype 25 :resulttypmod -1 :resultcollid 100 :relabelformat 2 :location -1
            } {
               CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 58 :constvalue 5 [ 20 0 0 0 120 ]
        }) :location 56
    } {
        OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ({
            RELABELTYPE :arg {
                VAR :varno 1 :varattno 6 :vartype 1043 :vartypmod 24 :varcollid 100 :varlevelsup 0 :varnosyn 1 :varattnosyn 6 :location 65} :resulttype 25 :resulttypmod -1 :resultcollid 100 :relabelformat 2 :location -1
        } {
            CONST :consttype 25 :consttypmod -1 :constcollid 100 :constlen -1 :constbyval false :constisnull false :location 82 :constvalue 5 [ 20 0 0 0 120 ]
        }) :location 80
    }) :location -1
}

[2] Example of other currently unsupported qualifier:

('SELECT "queues"."vhost" FROM "queues" WHERE (NOT (NOT "queues"."durable") AND "queues"."state" = %s)', (p1,))

which emits:

WARNING:  unsupported expression for extractClauseFrom
DETAIL:  {VAR :varno 1 :varattno 5 :vartype 16 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 5 :location 150}
mfenniak commented 4 months ago

I think it is smart to be skeptical about supporting ORs, as it really complicates building a simple FDW. My own dynamodb_fdw based onto a system that doesn't support OR queries, so it would have to fail, ignore (do a full table scan and let PG handle it), or split the ORs into multiple conditions and make multiple independent internal operations.

That last one is an interesting idea that could be adapted into multicorn... I'd be imagining a solution like this, when an OR is identified by multicorn:

ShaheedHaque commented 4 months ago

To counter my own argument, I suppose an FDW which connects to a database (or an ORM), as in the case of the OP, is the exception where pushdown for OR and other complex "quals" would actually make sense.

luss commented 4 months ago

I'm also skeptical about anything that complicates multcorns's core mission of making it simple to build FDW's in python. DynamoDB, ElasticSearch, BigQuery (and others) are perfect FDW examples where there is no (popular/supported) C/C++ interface into these datastores and multicorn2 is used to simplify the writing of the FDW obviating the need to do any C coding in the FDW itself.

On Thu, May 23, 2024 at 4:16 AM ShaheedHaque @.***> wrote:

To counter my own argument, I suppose an FDW which connects to a database (or an ORM), as in the case of the OP, is the exception where pushdown for OR and other complex "quals" would actually make sense.

— Reply to this email directly, view it on GitHub https://github.com/pgsql-io/multicorn2/issues/8#issuecomment-2126505659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWOHVT6UJZKZSFJDXHIYTZDWQXBAVCNFSM5VY4EV32U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJSGY2TANJWGU4Q . You are receiving this because you were mentioned.Message ID: @.***>