paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
PostgreSQL License
383 stars 15 forks source link

feat: Support generated/renamed columns in DuckDB #103

Closed rebasedming closed 2 months ago

rebasedming commented 2 months ago

Ticket(s) Closed

What

In DuckDB, it is possible to control which columns are selected or generate new columns in a view:

SELECT col1 as renamed_col, 1 as fixed FROM read_parquet('path/to/file.parquet');

We now expose this to the user so they can do the same when creating foreign tables. This is done via the new select option. (I couldn't name it columns because the CSV reader already has a columns option).

CREATE FOREIGN TABLE trips ()
SERVER parquet_server
OPTIONS (files 's3://paradedb-benchmarks/yellow_tripdata_2024-01.parquet', select 'vendorid as vendor_id, 2024 as year, 1 as month');

SELECT * FROM trips LIMIT 1;
 vendor_id | year | month
-----------+------+-------
         2 | 2024 |     1
(1 row)

Why

This makes it possible to add generated columns to Parquet files that can be used as partition columns (will be documented in a future PR). See #56.

How

Tests

See added test.