Closed BigBerny closed 2 months ago
Thank you for reporting, we'll get this fixed ASAP.
Hi @BigBerny , could you provide more details for the reproduction? I am diving into this case but no error found.
pg_lakehouse=# CREATE TEMP TABLE predictions (
date_day DATE,
queryLatency INT
);
CREATE TABLE
pg_lakehouse=# INSERT INTO predictions (date_day, queryLatency) VALUES
('2023-03-01', 100),
('2023-03-01', 150),
('2023-03-01', 200),
('2023-03-02', 50),
('2023-03-02', 100),
('2023-03-03', 250),
('2023-03-03', 300);
INSERT 0 7
pg_lakehouse=# SELECT DATE_TRUNC('day', date_day), AVG(queryLatency)
FROM predictions
GROUP BY DATE_TRUNC('day', date_day);
date_trunc | avg
------------------------+----------------------
2023-03-03 00:00:00+08 | 275.0000000000000000
2023-03-01 00:00:00+08 | 150.0000000000000000
2023-03-02 00:00:00+08 | 75.0000000000000000
(3 rows)
@Weijun-H Unless I missed something, the temp table is directly sitting in PostgreSQL instead of being in pg_analytic's DuckDB. I used your example to create a parquet file, loaded it, and got the same error as reported.
-- LOAD PARQUET
CREATE FOREIGN DATA WRAPPER parquet_wrapper
HANDLER parquet_fdw_handler
VALIDATOR parquet_fdw_validator;
CREATE SERVER parquet_server
FOREIGN DATA WRAPPER parquet_wrapper;
CREATE FOREIGN TABLE latencies ()
SERVER parquet_server
OPTIONS (files '/datasets/latencies.parquet');
SELECT DATE_TRUNC('day', date_day), AVG(query_latency)
FROM latencies
GROUP BY DATE_TRUNC('day', date_day);
I used parquet files read by pg_lakehouse/duckdb as said by @evanxg852000
@evanxg852000 I've reverted this PR as the tests were not passing. We simply had not noticed that the CI was not running
What happens?
When running this query I get an error:
Error:
Query 1 ERROR at Line 1: : ERROR: Column date_trunc has Arrow data type Date32 but is mapped to the BuiltIn(TIMESTAMPOID) type in Postgres, which are incompatible. If you believe this conversion should be supported, please submit a request at https://github.com/paradedb/paradedb/issues.
Since DATE_TRUNC() is quite fundamental for many analysis, for whenever you want to group by day, week, month or year, it would be awesome if this could be fixed somehow. Tools like Metabase often offer users a way to choose granularity of an analysis
To Reproduce
Do a GROUP BY on DATE_TRUNC(). It happens with date and timestamp columns and also for 'week' etc.
OS:
Ubuntu(?) with PostgreSQL 16.3
ParadeDB Version:
0.8.3
Full Name:
Janis
Affiliation:
Typewise
What is the latest build you tested with? If possible, we recommend testing by compiling the latest
dev
branch.I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include the code required to reproduce the issue?
Did you include all relevant configurations (e.g., CPU architecture, PostgreSQL version, Linux distribution) to reproduce the issue?