trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.55k stars 3.03k forks source link

Add support for Datetime64 type in ClickHouse connector #10537

Open ebyhr opened 2 years ago

ebyhr commented 2 years ago

See TODO comment of "https://github.com/trinodb/trino/issues/10537" in ClickHouseClient

ebyhr commented 2 years ago

https://github.com/trinodb/trino/issues/10538#issuecomment-1020668450

tangjiangling commented 2 years ago

I'd like to work on this, please assign it to me.

tangjiangling commented 2 years ago

After reading this comment, my understanding is that now we need to support not only ClickHouseDate/DateTime, but also ClickHouseDate32/DateTime64, and ClickHouseDate32 mapping to Trinodate and ClickHouse DateTime64 mapping to Trinotimestamp(n) with time zone, besides these, all the points mentioned in this comment need to be implemented, do I understand it right?

ebyhr commented 2 years ago

The difference between DateTime and DateTime64 is sub-second precision as far as I know, so timestamp(n) without time zone looks better.

tangjiangling commented 2 years ago

My mistake, after reading the ClickHouse documentation carefully, you are right.

tangjiangling commented 2 years ago

One more question:

Do we need to support ClickHouse DateTime*(timezone)? I see that we are now letting the user control how to handle data types not supported by Trino via the parameter unsupported-type-handling.

ebyhr commented 2 years ago

@tangjiangling Sorry, I was missing your comment. Yes, it would be nice to support DateTime*(timezone) as well.

tangjiangling commented 2 years ago

@ebyhr Never mind, I will continue this in the near future.

eshirvana commented 2 years ago

@tangjiangling are you still working on this ? any ETA?

tangjiangling commented 2 years ago

are you still working on this ?

Yes.

any ETA?

Not sure, I already have a pending PR for supporting DateTime64, my plan is to wait until this PR is merged before submitting.

CC @hashhar

hashhar commented 2 years ago

Thanks for the reminder @tangjiangling . I'll take another look.

eshirvana commented 2 years ago

@ebyhr @hashhar @tangjiangling Let me know if I can help in any way to accelerate this

tangjiangling commented 2 years ago

I already have a pending PR for supporting DateTime64, my plan is to wait until this PR is merged before submitting.

@eshirvana Sorry for the delay, the PR mentioned here should be merged today or tomorrow and I will file another PR in time for DateTime64 support then (maybe this weekend, you can keep en eye on this)

eshirvana commented 2 years ago

@tangjiangling awesome great ,thank you would you please link/tag the new pr you will merge next week here as well?

eshirvana commented 2 years ago

@tangjiangling hi , any good news yet ? ;) if there is anything I can help to accelerate this please let me know...

tangjiangling commented 2 years ago

@eshirvana see #15040

eshirvana commented 2 years ago

@tangjiangling

Cool , thanks , does this mean trino will be able to pushdown datetime64 predicates with this pr?

tangjiangling commented 2 years ago

does this mean trino will be able to pushdown datetime64 predicates with this pr?

yep

riteshvaryani commented 1 year ago

hi, are there plans to merge this pr in trino?

mahdimalverdi commented 3 months ago

Hi, I am working with a table in ClickHouse that contains a timestamp(0) column with a time zone. I encountered an issue when trying to filter the data using the following query:

SELECT *  FROM table WHERE date >= Date( '2024-08-19 00:00:00.000')  AND date < Date( '2024-08-31 00:00:00.000') limit 10 ;

The query seems to attempt loading all data, which impacts performance significantly. However, when I modify the query to remove the milliseconds, as shown below, the issue is resolved:

SELECT *  FROM table WHERE date >= Date( '2024-08-19 00:00:00')  AND date < Date( '2024-08-31 00:00:00') limit 10 ;

Problem: The original query with milliseconds (.000) causes ClickHouse to load all data. The problem is avoided by removing the milliseconds from the timestamp.