paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
PostgreSQL License
380 stars 15 forks source link

DATE_TRUNC('day', date_day) not supported #49

Closed BigBerny closed 2 months ago

BigBerny commented 4 months ago

What happens?

When running this query I get an error:

SELECT DATE_TRUNC('day', date_day), AVG("queryLatency")
FROM predictions
GROUP BY DATE_TRUNC('day', date_day)

Error: Query 1 ERROR at Line 1: : ERROR: Column date_trunc has Arrow data type Date32 but is mapped to the BuiltIn(TIMESTAMPOID) type in Postgres, which are incompatible. If you believe this conversion should be supported, please submit a request at https://github.com/paradedb/paradedb/issues.

Since DATE_TRUNC() is quite fundamental for many analysis, for whenever you want to group by day, week, month or year, it would be awesome if this could be fixed somehow. Tools like Metabase often offer users a way to choose granularity of an analysis

To Reproduce

Do a GROUP BY on DATE_TRUNC(). It happens with date and timestamp columns and also for 'week' etc.

OS:

Ubuntu(?) with PostgreSQL 16.3

ParadeDB Version:

0.8.3

Full Name:

Janis

Affiliation:

Typewise

What is the latest build you tested with? If possible, we recommend testing by compiling the latest dev branch.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include the code required to reproduce the issue?

Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?

philippemnoel commented 4 months ago

Thank you for reporting, we'll get this fixed ASAP.

Weijun-H commented 3 months ago

Hi @BigBerny , could you provide more details for the reproduction? I am diving into this case but no error found.

pg_lakehouse=# CREATE TEMP TABLE predictions (
  date_day DATE,
  queryLatency INT
);
CREATE TABLE

pg_lakehouse=# INSERT INTO predictions (date_day, queryLatency) VALUES
  ('2023-03-01', 100),
  ('2023-03-01', 150),
  ('2023-03-01', 200),
  ('2023-03-02', 50),
  ('2023-03-02', 100),
  ('2023-03-03', 250),
  ('2023-03-03', 300);
INSERT 0 7

pg_lakehouse=# SELECT DATE_TRUNC('day', date_day), AVG(queryLatency) 
FROM predictions 
GROUP BY DATE_TRUNC('day', date_day);
       date_trunc       |         avg          
------------------------+----------------------
 2023-03-03 00:00:00+08 | 275.0000000000000000
 2023-03-01 00:00:00+08 | 150.0000000000000000
 2023-03-02 00:00:00+08 |  75.0000000000000000
(3 rows)
evanxg852000 commented 3 months ago

@Weijun-H Unless I missed something, the temp table is directly sitting in PostgreSQL instead of being in pg_analytic's DuckDB. I used your example to create a parquet file, loaded it, and got the same error as reported.


-- LOAD PARQUET
CREATE FOREIGN DATA WRAPPER parquet_wrapper
HANDLER parquet_fdw_handler
VALIDATOR parquet_fdw_validator;

CREATE SERVER parquet_server
FOREIGN DATA WRAPPER parquet_wrapper;

CREATE FOREIGN TABLE latencies ()
SERVER parquet_server
OPTIONS (files '/datasets/latencies.parquet');

SELECT DATE_TRUNC('day', date_day), AVG(query_latency) 
FROM latencies 
GROUP BY DATE_TRUNC('day', date_day);
BigBerny commented 3 months ago

I used parquet files read by pg_lakehouse/duckdb as said by @evanxg852000

philippemnoel commented 3 months ago

@evanxg852000 I've reverted this PR as the tests were not passing. We simply had not noticed that the CI was not running