paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
GNU Affero General Public License v3.0
129 stars 9 forks source link

Query bytea type result error #53

Open kysshsy opened 1 month ago

kysshsy commented 1 month ago

What happens?

I am developing quals push-down feature and have found that the query result is abnormal

pg_lakehouse=# select binary_col from t1   where  binary_col = '\x68656c6c6f';
 binary_col
------------
(0 rows)

pg_lakehouse=# select binary_col from t1   where  binary_col = E'\\x68656c6c6f';
 binary_col
------------
(0 rows)

pg_lakehouse=# select binary_col from t1;
    binary_col
------------------
 \x68656c6c6f
 \x
 \x70617271756574
(3 rows)

pg_lakehouse=# select binary_col from t1   where  binary_col = E'';
 binary_col
------------
 \x
(1 row)

pg_lakehouse=#

To Reproduce

  1. export parquet file in the code. test_arrow_types.parquet
  2. create foreign table
    
    CREATE FOREIGN DATA WRAPPER parquet_wrapper
    HANDLER parquet_fdw_handler
    VALIDATOR parquet_fdw_validator;

CREATE SERVER parquet_server FOREIGN DATA WRAPPER parquet_wrapper;

CREATE FOREIGN TABLE t1 ( boolean_col boolean, int8_col smallint, int16_col smallint, int32_col integer, int64_col bigint, uint8_col smallint, uint16_col integer, uint32_col bigint, uint64_col numeric(20), float32_col real, float64_col double precision, date32_col date, date64_col date, binary_col bytea, large_binary_col bytea, utf8_col text, large_utf8_col text ) SERVER parquet_server OPTIONS (files '/home/kyss/labs/paradedb/test_arrow_types.parquet');

3. query
``` sql
select binary_col from t1   where  binary_col = '\x68656c6c6f';
select binary_col from t1   where  binary_col = E'\\x68656c6c6f';
select binary_col from t1   where  binary_col = E'';
select binary_col from t1;

OS:

x86

ParadeDB Version:

0.8.6.

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB pg_lakehouse Extension

Full Name:

kyss

Affiliation:

NA

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

kysshsy commented 1 month ago

The issue occurs in both the FDW and executor hook. I suspect there's an issue with connection::create_arrow. It can query correctly with E'' but fails with non-zero-length bytes.

destrex271 commented 1 month ago

Hi @philippemnoel I would like to look into this

philippemnoel commented 1 month ago

Hi @philippemnoel I would like to look into this

All yours :) Thank you for looking into it!

kysshsy commented 1 month ago

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

destrex271 commented 1 month ago

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

That is a possibility, pg might be using the escape format which is causing all the problem. We might need to decode the string and then encode it into a format that goes well with Duckdb.

But before that I'd like to finish investigating from my side too

destrex271 commented 1 month ago

Hi @kysshsy can you please attach the paraquet file here, if possible?

kysshsy commented 1 month ago

test_arrow_types.parquet.zip @destrex271 Yes, github not support parquet files. I forget to zip it.

destrex271 commented 1 month ago

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

Yep seems like this is the actual problem. I checked the query execution but its working fine.

Will raise a PR as soon as I come up with a possible fix.

philippemnoel commented 1 month ago

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

Yep seems like this is the actual problem. I checked the query execution but its working fine.

Will raise a PR as soon as I come up with a possible fix.

Thank you :)

philippemnoel commented 3 weeks ago

Hi, I think it's about the difference of duckdb and postgres hex representation. https://www.postgresql.org/docs/current/datatype-binary.html https://duckdb.org/docs/sql/data_types/blob

Yep seems like this is the actual problem. I checked the query execution but its working fine.

Will raise a PR as soon as I come up with a possible fix.

Hi @destrex271. Any update on this?

destrex271 commented 3 weeks ago

Nope not any as of now, I think it's better if someone else picks it up for now

philippemnoel commented 3 weeks ago

Nope not any as of now, I think it's better if someone else picks it up for now

Thank you!