spiceai / spiceai

A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets across databases, data warehouses, and data lakes.
https://docs.spiceai.org
Apache License 2.0
1.92k stars 79 forks source link

One or maybe most federated queries with time predicate fail #1625

Closed ahirner closed 4 months ago

ahirner commented 5 months ago

Describe the bug

One or maybe most federated queries with time predicate fail.

To Reproduce

Refer to a postgres: table with Timestamp(Milliseconds, None)

sql> describe orders_final;
+----------------+------------------------------+-------------+
| column_name    | data_type                    | is_nullable |
+----------------+------------------------------+-------------+
...
| time           | Timestamp(Millisecond, None) | YES         |

Query it with predicate:

sql> select * from orders_final where received_at > '2024-01-01';
Error An internal error occurred. Execute '.error' to show details.
sql> .error
Status { code: Internal, message: "External error: This feature is not implemented: Unsupported DataType: conversion: Timestamp(Nanosecond, None)", source: None }
sql> select * from orders_final where time > to_date('2024-01-01');
Error An internal error occurred. Execute '.error' to show details.
sql> select * from orders_final where time > '2024-01-01';
Error An internal error occurred. Execute '.error' to show details.
sql> select * from orders_final where time > to_timestamp('2024-01-01');
Error An internal error occurred. Execute '.error' to show details.
sql> .error
Status { code: Internal, message: "External error: This feature is not implemented: Unsupported DataType: conversion: Timestamp(Millisecond, None)", source: None }
sql> select * from orders_final where time > to_timestamp_millis('2024-01-01');
Error An internal error occurred. Execute '.error' to show details.
Status { code: Internal, message: "External error: Execution error: Unable to query arrow: db error: ERROR: function to_timestamp_millis(unknown) does not exist\nHINT: No function matches the given name and argument types. You might need to add explicit type casts.", source: None }

Expected behavior

Ideally, an error should be more specific to trace, as there are more than one conversions to the same type in the codebase. Rows are returned where the predicate is true.

Additional context

I think it may happen with Timestamp(_, _) , Duration, Date64, since not many time related types are unparseable in datafusion.

Related: https://github.com/spiceai/spiceai/issues/1580 et al.

digadeesh commented 5 months ago

Thanks for the report @ahirner! We're looking into this.

sgrebnov commented 4 months ago

@ahirner - This should be resolved by the following PR: https://github.com/spiceai/spiceai/pull/1795. Please feel free to reopen if you still see any problems with time predicates.

Test data

CREATE TABLE orders_final (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL, 
    received_at TIMESTAMP WITH TIME ZONE, 
    received_at2 TIMESTAMP,
    status VARCHAR(255)
);

INSERT INTO orders_final (customer_id, received_at, received_at2, status)
VALUES
    (1, '2024-01-02 08:00:00+00', '2024-01-02 08:00:00', 'completed'),
    (2, '2024-01-03 09:30:00+00', '2024-01-03 09:30:00', 'pending'),
    (3, '2023-12-31 15:45:00+00', '2023-12-31 15:45:00', 'completed');

Queries

sql> SELECT * FROM orders_final WHERE received_at > '2024-01-01';
+----------+-------------+---------------------+---------------------+-----------+
| order_id | customer_id | received_at         | received_at2        | status    |
+----------+-------------+---------------------+---------------------+-----------+
| 1        | 1           | 2024-01-02T08:00:00 | 2024-01-02T08:00:00 | completed |
| 2        | 2           | 2024-01-03T09:30:00 | 2024-01-03T09:30:00 | pending   |
+----------+-------------+---------------------+---------------------+-----------+

Time: 0.015999625 seconds. 2 rows.
sql> SELECT * FROM orders_final WHERE received_at2 < '2024-01-01';
+----------+-------------+---------------------+---------------------+-----------+
| order_id | customer_id | received_at         | received_at2        | status    |
+----------+-------------+---------------------+---------------------+-----------+
| 3        | 3           | 2023-12-31T15:45:00 | 2023-12-31T15:45:00 | completed |
+----------+-------------+---------------------+---------------------+-----------+