Open humaidkidwai opened 4 months ago
Hi @humaidkidwai! Thanks for filing. We're converting our base Dockerfile from the standard Postgres to Bitnami's Postgres, which includes PostGIS and is more production-ready. We'll also be including pg_cron
.
Do you need PostGIS, or you specifically need GeoParquet/geo data over columnar tables? That is something we can perhaps do eventually, but it would be much lower priority
That's good to know @philippemnoel, PostGIS support will be superb! I would be more interested in GeoParquet which is essentially Parquet with a standardized way of storing geometries. Will keep following
This you mean? https://github.com/opengeospatial/geoparquet
We're open to it, but I see some pretty serious blocker:
We use Parquet via delta-rs
. So there would need to be a Rust-based implementation that's mature enough as a crate to be implemented by delta-rs
. Until that day, we won't be able to integrate it within ParadeDB
Are you familiar with such an initiative?
There is some work on that front by Kyle Barron, he is working on a Rust-based implementation of GeoArrow geoarrow-rs
and recently merged some changes to support reading and writing GeoParquet. However I wouldn't go as far as to say that it is mature enough yet.
Alright, well excited to follow the development!
GeoParquet might not work out of the box with delta lake (there are ongoing spec discussions for iceberg compatibility) and I wouldn't be surprised if delta-rs would want to implement geo support in an extension anyways. And then geo support in datafusion doesn't exist yet (my work is a precursor to it, but I'm not focusing on datafusion integration yet).
So it's probably a while before it's directly integratable in paradedb
GeoParquet might not work out of the box with delta lake (there are ongoing spec discussions for iceberg compatibility) and I wouldn't be surprised if delta-rs would want to implement geo support in an extension anyways. And then geo support in datafusion doesn't exist yet (my work is a precursor to it, but I'm not focusing on datafusion integration yet).
So it's probably a while before it's directly integratable in paradedb
Yeah that makes sense -- that's the feeling I was getting as well. Thank you for chiming in and good luck with your work :) We're excited to follow along and integrate it in ParadeDB once we can!
We've added support for PostGIS, by the way. As far as GeoParquet, that is in @kylebarron's hands :)
@kylebarron -- we're considering moving to DuckDB as the engine powering our analytics offering. It seems to support GeoParquet in some form, as per: https://github.com/cholmes/duckdb-geoparquet-tutorials
Are you familiar at all? Would love your input on this.
DuckDB just had support for GeoParquet merged
we're considering moving to DuckDB as the engine powering our analytics offering
That's a big change! Instead of datafusion? That would be interesting to read why
Yeah I think the next release of DuckDB-spatial has GeoParquet support planned.
we're considering moving to DuckDB as the engine powering our analytics offering
That's a big change! Instead of datafusion? That would be interesting to read why
Yeah I think the next release of DuckDB-spatial has GeoParquet support planned.
We'll write about why, I'm excited to share it with you. We'll still be using DataFusion, but in a different part of the stack. More here soon :)
following this eagerly
are there any (early stage) docs about how to use the geoparquet functionality?
What I would be interested to see support for geospatial data in ParadeDB. As Postgres has a PostGIS extension that can handle a diverse set of geospatial use cases, ParadeDB could possibly add support for something similar but using column stores. GeoParquet is an OGC incubated file format which essentially extends the Parquet format to support standard vector data in WKT/WKB
Why Support for GeoParquet will be super helpful as an increasing number of organizations (source.coop Microsoft) transforming their data to a cloud native file format for interoperability and geospatial analysis at scale. DuckDB supports it already and many are to follow suit
How As GeoParquet is not an entirely new file format and just a specification for the existing Parquet format (by specifying additional geo metadata with every parquet file), it can be smoothly integrated with ParadeDB's native support for Parquet