pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.88k stars 5.8k forks source link

Statistics of TIMESTAMP column is stored as datetime string without timezone information #52429

Open winoros opened 5 months ago

winoros commented 5 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

image

As you can see, the type of this column is TIMESTAMP, but we store its histogram with datetime string without timezone information. And the timezone is not UTC, it's decided by the session executing the ANALYZE command.

This is not correct and will introduce an estimation error.

winoros commented 5 months ago

This is the root cause of https://github.com/pingcap/tidb/issues/41985

winoros commented 5 months ago

We've already stored the histogram of TIMESTAMP to string value. So it's not easy to store it at its original value for compatibility issues.

A possible fix is that we always store the datetime string with UTC timezone, and do conversion when doing row count estimation.

winoros commented 5 months ago

TopN or index's histogram is correct. They're following the normal encoding&decoding procedure.

winoros commented 5 months ago

It's a long-existing issue. I think i would not solve it before 8.1.0 is released. I'll fix it in the minor version.