paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
PostgreSQL License
196 stars 13 forks source link

`time_bucket` function calculates buckets for all rows even when LIMIT is applied #86

Open vaibhawvipul opened 2 months ago

vaibhawvipul commented 2 months ago

What happens?

32 closed #52.

However, when applying a limit to a query with time_bucket, the bucket is still calculated for all rows in the table

e.g. select time_bucket('5m', trips.tpep_pickup_datetime) from trips limit 10; touches all rows not just the first 10

using a subquery seems to work fine as a workaround select time_bucket('5m', t.tpep_pickup_datetime) from (select * from trips limit 10) t; but we want limit to work properly and get pushed down to duckdb

To Reproduce

More info here - https://github.com/paradedb/pg_analytics/pull/32#issuecomment-2287746642

OS:

all versions

ParadeDB Version:

latest

Are you using ParadeDB Docker, Helm, or the extension(s) standalone?

ParadeDB Docker Image

Full Name:

Vipul Vaibhaw

Affiliation:

ParadeDB

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?