feat: Enable Disk Caching for Multiple Sources (Supersedes PR#30) - Githubissues

paradedb / pg_analytics

DuckDB-powered analytics for Postgres

https://paradedb.com

PostgreSQL License

383 stars 15 forks source link

feat: Enable Disk Caching for Multiple Sources (Supersedes PR#30) #148

Closed shamb0 closed 1 month ago

shamb0 commented 1 month ago

This PR is part of a pair; please review, validate, and consider merging both.

https://github.com/paradedb/paradedb/pull/1751 https://github.com/paradedb/pg_analytics/pull/148

Scope of this PR:

This PR supersedes the following previous submissions:

It is based on consolidated requirements and feedback from the review comments on the above closed PRs.

What

This PR separates the core implementation from PR#30, focusing on enabling disk caching for various sources supported by pg_analytics.
The benchmarking, specifically for Hive-style partitioned Parquet sources, leverages this core disk cache functionality and is implemented in paradedb/cargo-paradedb/src/pga_benches/pga_benchlogs_hsp_pq.rs.

Why

How

shamb0 commented 1 month ago

Hi @philippemnoel,

Thank you for the review comments, I really appreciate it. Moving forward, I'll aim for smaller PRs with minimal changes to make them easier to review.