paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
PostgreSQL License
380 stars 15 forks source link

Add write support to `pg_analytics` #107

Open philippemnoel opened 2 months ago

philippemnoel commented 2 months ago

What feature are you requesting?

This feature would enable users to write data to AWS S3, GCS and Azure Blob Storage. This would primarily be helpful for tiering data off to minimize cost. At first, we would only want to support the main file formats and object stores, not the open table formats like Delta Lake and Iceberg

Why are you requesting this feature?

Enable users to tier data off to AWS S3 and others easily

What is your proposed implementation for this feature?

Needs proper investigation. DuckDB has capabilities for this which we would need to expose properly.

Full Name:

Philippe Noël

Affiliation:

ParadeDB

rebasedming commented 2 months ago

After an initial investigation, it looks like we can use DuckDB replacement scans, which allow you to register a custom callback to fire if DuckDB tries to read a table that doesn't exist in DuckDB.

So if the user tries to COPY a Postgres table to S3, we can

  1. Intercept it in the utility hook
  2. Register a replacement scan that tells DuckDB how to scan the Postgres table
  3. Have DuckDB execute the entire COPY statement
Weijun-H commented 1 month ago

After the merge at https://github.com/duckdb/duckdb-rs/issues/370, we can utilize the complete DuckDB C API to solve this ticket.

philippemnoel commented 1 month ago

After the merge at duckdb/duckdb-rs#370, we can utilize the complete DuckDB C API to solve this ticket.

Great find -- Here is the PR: https://github.com/duckdb/duckdb-rs/pull/381

philippemnoel commented 1 month ago

@Weijun-H this just got merged, 5 days ago! This would be a really wonderful PR. Write support is our most requested feature.

Weijun-H commented 1 month ago

/take