paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
PostgreSQL License
383 stars 15 forks source link

chore: bump duckdb-rs version #174

Closed DeoLeung closed 3 days ago

DeoLeung commented 2 weeks ago

Ticket(s) Closed

What

support reading and writing geoparquet

Why

with duckdb_spatial>=v1.1.0 geometry column in parquet will be read/write as geometry instead of blob

How

no explicit change of sql is needed

Tests

a read/write parquet test should be added later

CLAassistant commented 2 weeks ago

CLA assistant check
All committers have signed the CLA.

philippemnoel commented 1 week ago

This PR would need documentation so people know how to use it

DeoLeung commented 1 week ago

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally

it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

philippemnoel commented 1 week ago

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally

it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

Yes, that’s because duckdb-rs is on duckdb 1.0.0, not 1.1.0 which it seems like it needs. We would need you to submit a PR to our fork to duckdb-rs to upgrade it as well there, then update the commit here.

DeoLeung commented 1 week ago

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

Yes, that’s because duckdb-rs is on duckdb 1.0.0, not 1.1.0 which it seems like it needs. We would need you to submit a PR to our fork to duckdb-rs to upgrade it as well there, then update the commit here.

from https://github.com/paradedb/duckdb-rs/tree/main/crates/libduckdb-sys it seems it's already on v1.1.1

philippemnoel commented 1 week ago

This PR would need documentation so people know how to use it

just fix the packages and now i can build it and test it locally it seems there's no geometry type cell.rs, currently it still read the geometry column as text :(

Yes, that’s because duckdb-rs is on duckdb 1.0.0, not 1.1.0 which it seems like it needs. We would need you to submit a PR to our fork to duckdb-rs to upgrade it as well there, then update the commit here.

from https://github.com/paradedb/duckdb-rs/tree/main/crates/libduckdb-sys it seems it's already on v1.1.1

Ah yes! You are right! Oops, I had no idea. Then, I'm not sure... Are you sure this is supported? Do we need to do anything to enable it?

philippemnoel commented 1 week ago

Could you please sign the CLA and I think this will be good to go :)

DeoLeung commented 1 week ago

Could you please sign the CLA and I think this will be good to go :)

done :)

for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry)

https://github.com/duckdb/duckdb/pull/12503 https://github.com/duckdb/pg_duckdb/issues/141

philippemnoel commented 1 week ago

Could you please sign the CLA and I think this will be good to go :)

done :)

for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry)

duckdb/duckdb#12503 duckdb/pg_duckdb#141

We ship with Postgis support and the spatial extension is enabled in pg_analytics. It was added a while back. Does that give you enough?

DeoLeung commented 1 week ago

Could you please sign the CLA and I think this will be good to go :)

done :) for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry) duckdb/duckdb#12503 duckdb/pg_duckdb#141

We ship with Postgis support and the spatial extension is enabled in pg_analytics. It was added a while back. Does that give you enough?

not enough at the moment. we want to store the analytic data into s3 in parquet format, most of them are gis data, with newly duckdb, we could dump them as geoparquet, and read it back directly as geometry data type

load spatial;
create table gis_test as select st_makepoint(127.0, 30.0) as geometry;
copy  gis_test to '~/Downloads/gis_test.parquet' (format parquet);
create table gis2 as select * from '~/Downloads/gis_test.parquet';
describe table gis2;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ geometry    │ GEOMETRY    │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

but in pg_analytics, it still read it as text column

CREATE FOREIGN TABLE gis2 () server parquet_server options(files '~/Downloads/gis_test.parquet');
select * from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
select column_name, data_type from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
 column_name | data_type 
-------------+-----------
 geometry    | text
(1 row)
philippemnoel commented 1 week ago

Could you please sign the CLA and I think this will be good to go :)

done :) for the geoparquet reference, i think it need some extra data type mapping (especially postgis geometry) duckdb/duckdb#12503 duckdb/pg_duckdb#141

We ship with Postgis support and the spatial extension is enabled in pg_analytics. It was added a while back. Does that give you enough?

not enough at the moment. we want to store the analytic data into s3 in parquet format, most of them are gis data, with newly duckdb, we could dump them as geoparquet, and read it back directly as geometry data type

load spatial;
create table gis_test as select st_makepoint(127.0, 30.0) as geometry;
copy  gis_test to '~/Downloads/gis_test.parquet' (format parquet);
create table gis2 as select * from '~/Downloads/gis_test.parquet';
describe table gis2;
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │  null   │   key   │ default │  extra  │
│   varchar   │   varchar   │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤
│ geometry    │ GEOMETRY    │ YES     │         │         │         │
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘

but in pg_analytics, it still read it as text column

CREATE FOREIGN TABLE gis2 () server parquet_server options(files '~/Downloads/gis_test.parquet');
select * from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
select column_name, data_type from information_schema.columns where table_schema = 'public' and table_name = 'gis2' and column_name = 'geometry';
 column_name | data_type 
-------------+-----------
 geometry    | text
(1 row)

Can you confirm the duckDB version you are using separately? And can you confirm the GIS extension is loaded in pg_analytics?

DeoLeung commented 1 week ago

confirm duckdb version 1.1.1 and 1.1.3 works with geoparquet

confirm postgis is loaded in postgres but I'm not sure how to check if duckdb spatial is enabled in pg_analytics :(

tried select duckdb_execute($$load spatial$$); with no luck

philippemnoel commented 1 week ago

confirm duckdb version 1.1.1 and 1.1.3 works with geoparquet

confirm postgis is loaded in postgres but I'm not sure how to check if duckdb spatial is enabled in pg_analytics :(

tried select duckdb_execute($$load spatial$$); with no luck

I think you would need to duckdb_execute this:

SELECT extension_name, installed, description
FROM duckdb_extensions();

https://duckdb.org/docs/extensions/overview.html