tada / pljava

PL/Java is a free add-on module that brings Java™ Stored Procedures, Triggers, Functions, Aggregates, Operators, Types, etc., to the PostgreSQL™ backend.
http://tada.github.io/pljava/
Other
250 stars 80 forks source link

Multiple longstanding bugs in java.sql.{Date,Time,Timestamp} conversions #200

Open jcflack opened 6 years ago

jcflack commented 6 years ago

Analysis for issue #199 reveals that there are several other issues involving the pre-(JSR 310, JDBC 4.2) date, time, timestamp conversions to and from the java.sql.{Date,Time,Timestamp} classes. Unlike issue #199, these are not regressions in 1.5.1, but older historical behavior; they will be left unchanged for 1.5.2 and fixed in some later release.

Unsurprisingly, they all have to do with the timezone manipulations that the driver has to do, even when the PG type involves no timezone, simply because the java.sql classes inherit from java.util.Date, which assumes knowledge of a timezone.

Plug: The Java 8, JDBC 4.2 / JSR 310 java.time types are free of these issues

These bugs in the legacy java.sql mappings can be fixed, and will be in a future PL/Java release, but this is a good time for another plug that PL/Java 1.5.1 supports the JDBC 4.2 mappings to Java 8's JSR 310 java.time types, which already are free of these issues, and migration to them is encouraged wherever practical.

The following issues are with the legacy mappings to java.sql.{Date,Time,Timestamp} classes.

Errors when PG and Java timezones do not match

For the PG types without timezone, the driver has to apply an offset corresponding to the local timezone. Historically, it has been using PG's local timezone. This gives correct results only when the PG timezone setting and the Java runtime's default timezone happen to be the same. That isn't guaranteed: the PG timezone can be set per session, possibly to match the location of a connected client, and the Java timezone typically defaults from the server OS settings (unless it is explicitly set in Java code, or with -Duser.timezone=... in pljava.vmoptions).

A future PL/Java version might automatically set user.timezone to match the PG session timezone (if it is not explicitly set in pljava.vmoptions), but the correctness of these conversions should not be left to depend on that. They are already querying PG for its timezone offset; they should simply be querying Java for its offset instead. This is explicit in the API spec for java.sql.Date:

the driver will set the time components to the time in the default time zone (the time zone of the Java virtual machine running the application) that corresponds to zero GMT.

(emphasis mine). Although there are no similarly explicit statements in the API docs for Time and Timestamp, it turns out that the principle holds. Using Java's timezone consistently for exactly the adjustments that are currently using PG's timezone will produce correct mappings whether or not the PG and Java zones happen to match.

Out-of-range values passed to java.sql.Time constructor

The java.sql.Time class is meant to represent a time of day only, and per spec is initialized the same as a java.util.Date representing that time of day on 1 January 1970 in UTC. The PG driver will currently pass a value representing the current date. The Java class does not (at present!) detect or reject that, and its toString method produces the expected value (the date part is simply ignored), but any Java code that looked at the underlying milliseconds value would find it out of the expected range.

Such Time objects round-trip back to PG successfully (but for the next issue), because PL/Java also does not (at present!) detect or reject an out-of-range value, but simply mods out any date part.

However:

Sign problems in modular arithmetic, java.sql.Time to PG time with time zone

This is reminiscent of the 'does % round or floor?' issue in #155 (C before C99 doesn't specify!):

\c
SET pljava.vmoptions TO '-Duser.timezone=GMT-1';
SET TIME ZONE 'FOO+1'; -- PG's syntax for the same offset; note sign interpretation flipped
SELECT *
 FROM
  (SELECT '23:00:00-1'::timetz) AS p,
  roundtrip(p) AS (class text, tostring text, roundtripped timetz);
   timetz    |     class     | tostring | roundtripped 
-------------+---------------+----------+--------------
 23:00:00-01 | java.sql.Time | 23:00:00 | -1:00:00-01

(the example ensures PG and Java are set to the same timezone, a simple one-hour offset west of Greenwhich. The problem can be observed in other timezones, but this makes a simple reproducible test.)

Bogus results for timestamp without time zone near start and end of summer time

\c
SET pljava.vmoptions TO '-Duser.timezone=Europe/Prague';
SET TIME ZONE 'Europe/Prague';
SELECT *
  FROM
    (VALUES
      ('2018-03-25 00:00:00'::timestamp),
      ('2018-03-25 01:00:00'),
      ('2018-03-25 02:00:00'),
      ('2018-03-25 03:00:00'),
      ('2018-10-28 00:00:00'),
      ('2018-10-28 01:00:00'),
      ('2018-10-28 02:00:00')
    ) AS p,
    roundtrip(p) AS (class text, tostring text, roundtripped timestamp);
       column1       |       class        |   tostring    |    roundtripped     
---------------------+--------------------+-----------------------+---------------------
 2018-03-25 00:00:00 | java.sql.Timestamp | 2018-03-25 00:00:00.0 | 2018-03-25 00:00:00
 2018-03-25 01:00:00 | java.sql.Timestamp | 2018-03-25 00:00:00.0 | 2018-03-25 00:00:00
 2018-03-25 02:00:00 | java.sql.Timestamp | 2018-03-25 01:00:00.0 | 2018-03-25 01:00:00
 2018-03-25 03:00:00 | java.sql.Timestamp | 2018-03-25 03:00:00.0 | 2018-03-25 03:00:00
 2018-10-28 00:00:00 | java.sql.Timestamp | 2018-10-28 00:00:00.0 | 2018-10-28 00:00:00
 2018-10-28 01:00:00 | java.sql.Timestamp | 2018-10-28 02:00:00.0 | 2018-10-28 02:00:00
 2018-10-28 02:00:00 | java.sql.Timestamp | 2018-10-28 02:00:00.0 | 2018-10-28 02:00:00

In this example, midnight before the start of summer time maps to Java correctly and roundtrips correctly. 01:00 fails to map or roundtrip correctly, despite an hour still remaining before anything interesting should happen. 02:00 is technically an invalid local timestamp (part of the hour that doesn't exist from 02:00 to 03:00). It could either be mapped ahead to 03:00 with or without a warning, or flagged as an error (Java 8's JSR 310 types offer those behaviors). To map it an hour earlier, as seen here, is in no interpretation correct! By 03:00, the bogosity is passed, and matching Java and roundtrip values are again produced.

At the end of summer time, midnight again is handled correctly, and correctness returns for 02:00 onward, but for the hour starting at 01:00, the Java and roundtrip values are incorrect. (There is an ambiguity at the end of summer time such that local times in the hour starting at 02:00 could represent two different actual instants, but that ends up being mostly transparent; unlike the start of summer time, this is an overlap, not a gap where some times are invalid. But the behavior here is messing up the hour preceding the overlap, which is plainly wrong.)

The choice of Europe/Prague as the timezone is not necessary to see the problem, but simply used to make this demonstration reproducible. The details of the errors vary if the zone is west rather than east of Greenwich, and the errors' duration varies with the magnitude of the offset: Prague is only one hour off GMT, and for other zones, far more than one or two hours can be in error on the morning of a transition.

jcflack commented 5 years ago

The second issue, out-of-range values passed to java.sql.Time constructor, is less unequivocally a bug than it might seem at first. The behavior was deliberately added in 25fd6df, to eliminate a different behavior that could be surprising and described as a bug.

Consider the following, "corrected" behavior (that is, without the deliberate "bug" added in 25fd6df), in an EST5EDT zone, in summer:

SELECT *
 FROM
  (SELECT current_time) AS p,
  roundtrip(p) AS (tostring text, roundtripped timetz);
       timetz       | tostring |  roundtripped   
--------------------+----------+-----------------
 22:44:55.177674-04 | 21:44:55 | 22:44:55.178-04

The value has been converted for Java and roundtripped back "correctly", but the java.sql.Time.toString method produces a value an hour different from that seen in PostgreSQL. That happens because current_time assigned the current, summer zone offset of -4 (and, on roundtrip, the current zone offset was again assumed, having been lost in the conversion to UTC that java.sql.Time requires). But with the time value confined to the Java epoch day of 1 January 1970, java.sql.Time applies the zone offset in effect on that day, -5, when extracting components of the time.

By contrast, with the 'fix' added in 25fd6df, the milliseconds value used to construct the Time object is not in the specified range around the epoch, but instead represents the time of interest added to midnight of the current day. The Java constructor doesn't complain or reject that, and produces a Time object that does its calculations according to the zone offset that would be in effect today.

To be sure, the 'bug' solved by that 'fix' crops up in very narrow circumstances: when the SQL value is a TIME WITH TIME ZONE, and its zone offset was assigned from the current session time zone, and that time zone observes a summer time, and the current status of summer time is different from that for 1 Jan 1970, and somebody is comparing extracted components from the Java object to those from PG and noticing the difference, and considers that a bug.

In the general case, the consistency that feels lacking there can't be expected at all, anyway. Consider the case where the original value does not just have a zone offset defaulted from the current session (as by current_time), but an explicit offset given when it was created (wouldn't that be the usual reason for using the TIME WITH TIME ZONE type?):

SELECT *
 FROM
  (SELECT timetz '22:44:55-03') AS p,
  roundtrip(p) AS (tostring text, roundtripped timetz);
   timetz    | tostring | roundtripped 
-------------+----------+--------------
 22:44:55-03 | 20:44:55 | 21:44:55-04

The weird behaviors are intrinsic to the ill-conceived need for shoehorning the data into this misbegotten Java type at all. Have I mentioned one should be using java.time.OffsetTime? And, meanwhile, that old 'fix' is guaranteed to surprise any code that calls getTime() on the object and gets a value decades outside of the specified range.

(Even on that point, to be honest, absolute strictness is hard to defend: Java itself, for example via the deprecated hours/minutes/seconds constructor, can install a milliseconds value outside of [0,MSECS_PER_DAY), depending on the current default zone. It seems that any client calling getTime on a java.sql.Time had better assume nothing, and just mod the result down into the expected range.)

There is no problem with moving ahead and 'fixing' (that is, removing the 25fd6df fix) the TIME WITHOUT TIME ZONE case. That will solve the issue of the getTime() value being out of range, without causing new perplexing behaviors.

But in the TIME WITH TIME ZONE case, it is harder to know what to want to do. Backing out the old fix would have the advantage of matching PgJDBC's behavior, and not violating expectations on what getTime() can return, but at the cost of changing long-standing PL/Java behavior, in a way that can bring perplexing results.

At the same time, the existing behavior can have its own perplexing cases, such as a time that can't be represented if the current day happens to be a transition to summer time and the correct time value falls in the gap. That's just wrong; there should be no way for a complication like that to arise in the mere representation of a time with no date.

That could be avoided by adding or subtracting one or more whole days during conversion, if the in-the-gap case is detected, to find a day that has no gap and the desired zone offset. But that would add even more complexity to a problem whose real answer is: hey, PL/Java already supports java.time.OffsetTime ... use it!

jcflack commented 5 years ago

There's another complication affecting TIME WITH TIME ZONE versus java.sql.Time. The Java object stores a UTC instant with no record of a local time zone. If that has to be assigned to a PostgreSQL TIME WITH TIME ZONE, some value needs to be put in the zone-offset field of the result. There are at least three choices:

  1. Keep it in UTC and assign zero for the zone offset.
  2. Adjust it as if in Java's default time zone, and use the offset of that (as of 1 January 1970)
  3. Adjust it as if in PG's session time zone and use the offset of that

(0) has the benefit of a strong argument from principle: the argument being supplied is in UTC, and carries with it no information to identify any other time zone, so (0) is the only conversion that doesn't involve blatantly making something up. But,

(1) has the benefit of being what pgJDBC does, and what the Java API docs specify (at least for the PreparedStatement setTime method that takes a time and a Calendar, for the case where the Calendar is null)

(2) is what PL/Java has been doing.

Changing that to (1) should be an invisible change in cases where the PG session time zone and the Java default time zone are the same.

And once again, application code would be best advised to use the java.time types instead.