ildus / clickhouse_fdw

ClickHouse FDW for PostgreSQL
Apache License 2.0
260 stars 55 forks source link

strange activity about join query with the join key condition. #63

Open sophie-jeong opened 3 years ago

sophie-jeong commented 3 years ago

Hello , Here is the my sql query

explain verbose SELECT a.shop_member_id , a.mall_id
, a.shop_no
, a.member_id
, a.cmember_name , b.email , b.cell FROM shop_member as a INNER JOIN shop_member_property b ON ( a.shop_member_id = b.shop_member_id and a.shop_member_id in ( '6d34478f-626d-d162-6758-e9fc98dcd6a3') ) WHERE 1=1 QUERY PLAN




Foreign Scan (cost=1.00..-1.00 rows=1 width=1) Output: a.shop_member_id, a.mall_id, a.shop_no, a.member_id, a.member_name, b.email, b.cell Relations: (shop_member a) INNER JOIN (shop_member_property b) Remote SQL: SELECT r1.shop_member_id, r1.mall_id, r1.shop_no, r1.member_id, r1.c_member_name, r2.c_email, r2.c_cell FROM bidb.shop_member r1 ALL INNE R JOIN bidb.shop_member_property r2 ON (TRUE) WHERE ((r2.shop_member_id = '6d34478f-626d-d162-6758-e9fc98dcd6a3')) AND ((r1.shop_member_id = '6d34478f-62 6d-d162-6758-e9fc98dcd6a3')) (4 rows)

Received exception from server (version 20.6.4): Code: 403. DB::Exception: Received from localhost:9000. DB::Exception: Cannot get JOIN keys from JOIN ON section: TRUE.

HagasSaan commented 3 years ago

Same, but works if in join clause for IN condition will have 2 items - works as expected For previous comment if might be as

INNER JOIN shop_member_property b ON ( a.shop_member_id = b.shop_member_id and a.shop_member_id in ( '6d34478f-626d-d162-6758-e9fc98dcd6a3', 'some_unknown_shop_member_id') )

HagasSaan commented 3 years ago

How to reproduce:

On Clickhouse:

create table default.deals_1_pipedrive
(
    date                        Date,
    __account_id                String,
    __row_hash                  UInt64,
    __row_id                    UInt16,
    __key_hash                  UInt64,
    __sign                      Int8,
    __insert_date               DateTime
)
    engine = CollapsingMergeTree(__sign)
        PARTITION BY toYYYYMM(date)
        ORDER BY (date, __account_id, __row_hash, __row_id)
        SETTINGS index_granularity = 8192;

CREATE VIEW default.deals_1_pipedrive_fdw
            (
             `date` Date,
             `__account_id` String,
             `__insert_date` DateTime
                )
AS
SELECT deals_1_pipedrive.date,
       deals_1_pipedrive.__account_id,
       deals_1_pipedrive.__insert_date
FROM default.deals_1_pipedrive
         FINAL;

INSERT INTO default.deals_1_pipedrive (
    date, __account_id, __row_hash, __row_id, __key_hash, __sign, __insert_date
) VALUES (
'2019-12-13', '1234', 1751228920605290501, 0,
8193464384205566314, 1, '2021-06-07 06:33:21'
);

-- auto-generated definition
create table report_accounts
(
    id           UInt64,
    account_id   String
)
    engine = File(TabSeparated);

INSERT INTO default.report_accounts (id, account_id) VALUES (218760, '1234');

then on CH FDW

create foreign table deals_1_pipedrive_fdw
    (
        date date,
        __account_id text
        )
    server clickhouse_svr;

create foreign table report_accounts
    (
        id integer,
        account_id text
        )
    server clickhouse_svr;

Then run

select * from deals_1_pipedrive_fdw m
join report_accounts s on m.__account_id = s.account_id
where m.__account_id in ('1234');

and it will fail with Cannot get JOIN keys from JOIN ON section: TRUE

Explain verbose will give Remote SQL: SELECT r1.date, r1.__account_id, r2.id, r2.account_id FROM "default".deals_1_pipedrive_fdw r1 ALL INNER JOIN "default".report_accounts r2 ON (TRUE) WHERE ((r2.account_id = '1234')) AND ((r1.__account_id = '1234'))

ildus commented 3 years ago

Hi, thank you for the script to reproduce. Unfortunately, there is nothing I can do on FDW side, since the conditions changed by postgres optimizer, and FDW gets the conditions as you can see and doesn't change them. Just to make sure I made a special build of postgres where I disabled process_equivalence function, and it made the query working. There could be some trick to avoid this optimization. Another option could be a discussion with the clickhouse developers to enable ON (True) construction.