sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
1.86k stars 147 forks source link

PostgreSQL: Error while mapping to Arrow types, if table has array columns #658

Open MadL1me opened 1 week ago

MadL1me commented 1 week ago

What language are you using?

Rust

What version are you using?

latest

What database are you using?

PostgreSQL

What dataframe are you using?

Arrow (not arrow2)

Can you describe your bug?

When trying to run Datafusion-federation: https://github.com/datafusion-contrib/datafusion-federation/blob/main/examples/examples/postgres-partial.rs, which uses ConnectorX to connect to postgres, I have the error of mapping Int8Array to arrow type. I've settuped local docker compose with postgres, where I have a table, which has bigint[] as one of the columns. I've tested another variants - with smallint[], integer[] - all have same error, but for different mapping type Int4Array and Int2Array

There is also a thread, about combining arrow2 to arrow-rs (https://github.com/apache/datafusion/issues/1532)

So, the issue probably can be resolved by bumping arrow create version, and adding mappings to postgres to proper arrow array types. This also will allow to get rid of arrow2 mapping code across the project.

What are the steps to reproduce the behavior?

Create postgres connection with array types.

Database setup if the error only happens on specific data or data type

Table schema:

create table test(
    id bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    arr bigint[]
);
Example query / code
select * from table_name

What is the error?

thread 'tokio-runtime-worker' panicked at examples/examples/postgres-partial.rs:27:293: called Result::unwrap() on an Err value: External("ConnectorX failed to run query: PostgresArrowTransportError(ConnectorX(NoConversionRule(\"Int8Array(true)\", \"connectorx::destinations::arrow::typesystem::ArrowTypeSystem\")))")

wangxiaoying commented 1 week ago

We haven't support array types in arrow yet. The implementations of arrow2 here can be used as a reference to enable this in arrow.

MadL1me commented 1 week ago

@wangxiaoying got it. Can I send a PR for arrow in that case?

wangxiaoying commented 6 days ago

@wangxiaoying got it. Can I send a PR for arrow in that case?

Of course. You are very welcome to submit a PR!

Please let me know if you encounter any issue.