prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.04k stars 5.37k forks source link

Wrong results for timestamp cast of out of range timestamps in the year 292278994 #23489

Open rschlussel opened 2 months ago

rschlussel commented 2 months ago

Timestamps in presto are stored as a long representing milliseconds since 1970-01-01. The max timestamp in UTC is 292278994-08-17 07:12:55.807 (you can get this by doing from_unixtime(9223372036854775807). Timestamps in the year, 292278994 but outside the max timestamp range are returning wrong results. It looks like some kind of overflow.

Example:

presto> SELECT cast('292278994-08-17 11:46:00.000' AS TIMESTAMP);
             _col0             
-------------------------------
 -292275055-05-17 11:46:00.000 

If you try a later year, you will get a casting error, which is the expected behavior:

presto> SELECT cast('292278995-08-17 11:46:00.000' AS TIMESTAMP);
Query 20240821_173135_60815_4x3kx failed: Value cannot be cast to timestamp: 292278995-08-17 11:46:00.000
auden-woolfson commented 2 months ago

This is an issue within the Joda external dependency that we are using for timestamp parsing. There are a few different ways we could proceed with this...

  1. Find a new library that doesn't have this problem and use that
  2. Implement our own parsing
  3. Add new constraints to timestamps in presto (make the valid range a year smaller) so we can fail on this error instead of returning a bad value
elharo commented 2 months ago

As Joda creator Steven Colebourne has stated. "Joda-Time has been a very successful date and time library, widely used and making a real difference to many applications over the past 12 years or so. But if you are moving your application to Java SE 8, its time to consider moving on to java.time, formerly known as JSR-310."

So I suggest replacing whatever code does this with java.time. presto exposes Joda in its public API in a few places so it's hard to remove it completely but we can probably fix the implementation.

tdcmeehan commented 2 months ago

Just FYI that a full attempt at removing our Joda dependency ended up being a lot of work and caused serious issues in production environments. I'm not saying we shouldn't do it, but I am saying that I expect this to be a lot of work that will be difficult to verify correctness for.

I presume the C++ eval doesn't have this bug.

rschlussel commented 2 months ago

C++ doesn't have this issue, though it currently supports a more limited date range than Java, only accepting dates between years -32767 and 32767. This is a known limitation until Velox updates to c++20

auden-woolfson commented 2 months ago

I've been starting to write some tests and experiment with the java util lib, and I'm wondering if we should support years that have less than 4 digits. Right now with the new library 0001 is a valid year in the input string, but 1 is not.