Open rschlussel opened 2 months ago
This is an issue within the Joda external dependency that we are using for timestamp parsing. There are a few different ways we could proceed with this...
As Joda creator Steven Colebourne has stated. "Joda-Time has been a very successful date and time library, widely used and making a real difference to many applications over the past 12 years or so. But if you are moving your application to Java SE 8, its time to consider moving on to java.time, formerly known as JSR-310."
So I suggest replacing whatever code does this with java.time. presto exposes Joda in its public API in a few places so it's hard to remove it completely but we can probably fix the implementation.
Just FYI that a full attempt at removing our Joda dependency ended up being a lot of work and caused serious issues in production environments. I'm not saying we shouldn't do it, but I am saying that I expect this to be a lot of work that will be difficult to verify correctness for.
I presume the C++ eval doesn't have this bug.
C++ doesn't have this issue, though it currently supports a more limited date range than Java, only accepting dates between years -32767 and 32767. This is a known limitation until Velox updates to c++20
I've been starting to write some tests and experiment with the java util lib, and I'm wondering if we should support years that have less than 4 digits. Right now with the new library 0001 is a valid year in the input string, but 1 is not.
Timestamps in presto are stored as a long representing milliseconds since 1970-01-01. The max timestamp in UTC is 292278994-08-17 07:12:55.807 (you can get this by doing from_unixtime(9223372036854775807). Timestamps in the year, 292278994 but outside the max timestamp range are returning wrong results. It looks like some kind of overflow.
Example:
If you try a later year, you will get a casting error, which is the expected behavior: