trinodb / trino-python-client

Python client for Trino
Apache License 2.0
309 stars 151 forks source link

Kafka _timestamp queries show different counts #337

Closed metalshanked closed 1 year ago

metalshanked commented 1 year ago

Expected behavior

Example:- the below query returns a different count when run from the trino python client (v0.321) and opposed to via Java apps like Datagrip etc

The query works fine when i revert trino to 0.320 SQLAlchemy v1.4.46

SELECT count(_message) as result
FROM mykafka."default"."kafka-topic" 
WHERE   "_timestamp" >= TIMESTAMP '2023-02-22 19:49:54 UTC'
AND   "_timestamp" <= TIMESTAMP '2023-02-22 19:50:58.001000 UTC'
LIMIT 1

df_kafka = pd.read_sql(sql_kafka, engine)


Actual behavior

Counts should remain same

Steps To Reproduce

try the above query via trino + sqlalchemy and via a JAVA IDE to compare

Log output

No response

Operating System

Windows

Trino Python client version

0.321.

Trino Server version

407

Python version

3.11

Are you willing to submit PR?

mdesmet commented 1 year ago

This query doesn't contain any Python typed values (aka prepared statements). Can you validate in your Trino cluster if the executed query is the same. In this case the trino-python-client doesn't do anyting more than sending the exact query to the server.

hashhar commented 1 year ago

@metalshanked closing this as I cannot reproduce. Feel free to reopen if you still have the issue.

The only relevant thing I can think of is whether the client you're using has a different session timezone. You can see it in the Trino UI on the query details page.

metalshanked commented 1 year ago

Apologies @hashhar for the delay in response. I see UTC in the timezone. Will check further. Thanks! image