sfa-siard / JdbcMySql

SIARD Suite - JdbcMySql
Other
2 stars 3 forks source link

Time zone seems to affect date values #1

Closed marhop closed 5 years ago

marhop commented 5 years ago

Consider a MySQL database with the following content:

create table `date_tab` (`date_col` date);
insert into `date_tab` values ('2018-11-20');

Downloading this database with siardfromdb (no fancy options) and then uploading it into a new MySQL database with siardtodb (again, no fancy options), the date value changes:

  1. 2018-11-20 (original database)
  2. 2018-11-19Z (SIARD file XML content)
  3. 2018-11-19 (restored database)

The time zone of the MySQL server hosting both the original and the restored database is CET (+01:00). If the time zone is set to UTC when invoking siardfromdb (JDBC property serverTimezone=UTC), no changes are applied and the correct date is restored by siardtodb:

  1. 2018-11-20 (original database)
  2. 2018-11-20Z (SIARD file XML content)
  3. 2018-11-20 (restored database)

But should the time zone setting really affect the conversion of pure date (not time!) values? And if it does, shouldn't round-tripping work correctly if the time zone is the same for both the original and the restored database?

Thanks, Martin

PS: Tested with SIARD Suite 2.1.99.

HartwigThomas commented 5 years ago

The time zone problem is explicitly explained in the SIARDGUI documentation (should probably be included in the SIARDCMD documentation too):

"The conversion of the database fields of type TIME and TIMESTAMP depends on the local time zone. If the time 15:30 is stored on a machine in Zurich, then it will be stored as the UTC time 14:30 (in Winter!) in the XML meta data. If you would prefer to interpret the times in the database unchanged as UTC times, you must start SiardGui with the option -Duser.timezone=GMT It is possible to call SiardGui with the name of a SIARD file to be opened as single argument. This permits setting siardgui.cmd as the default application for opening files with a .siard extension."

Of course, not only TIMEs but also DATEs depend on time zone. The "usual" setup is a database server using UTC and JDBC drivers that convert to and from local time/date (image a airplane flight database with users from all over the world). There is no way, SIARD can detect, whether the database is truly running with internal UTC dates/times. MySQL obviously introduced the serverTimezone property to achieve the same effect as my -Duser.timezone=GMT. (from the MySQL documentation: "Override detection/mapping of time zone. Used when time zone from server doesn't map to Java time zone")

So, how should SIARD handle this question in your opinion?

marhop commented 5 years ago

Hi,

I'm aware of this section in the docs, but thanks for the additional info and the example! I agree with you regarding the conversion of TIMESTAMP values to UTC for internal storage and I'm also fine with the conversion of TIME and DATETIME, albeit this part differs from MySQL's behaviour:

Values for TIMESTAMP columns are converted from the current time zone to UTC for storage, and from UTC to the current time zone for retrieval. The current time zone setting does not affect [...] values in DATE, TIME, or DATETIME columns.

However, I don't agree with (or maybe don't understand) this part of your answer:

also DATEs depend on time zone

How would this be defined? How can a pure date without any time info be converted between time zones without implicitly assuming a certain time? To illustrate this, which of these would be the correct way to convert the date 2018-11-20 from time zone +05:00 to UTC? They are both plausible, as long as I (arbitrarily!) assume a time of either 12:00 or 00:00.

So please correct me if I'm wrong, but in conclusion, I think SIARD should not modify DATE values at all because, for want of time information, it cannot know how a time zone shift would affect a date.

And if it does date conversions it should be consistent about them and not do erroneous conversions like 2018-11-20 (original database in time zone +01:00) → 2018-11-19Z (SIARD file) → 2018-11-19 (restored database in time zone +01:00), as described in my original post.

Thanks for your time, Martin

HartwigThomas commented 5 years ago

At almost any point in time there are two dates on earth. There are separated by the midnight point and the date line. So - of course - the date also depends on the time zone. You will notice this in the hypothetical airline database, where landings and take-offs have different dates at the same time depending on the time zone.

I agree, download-upload SHOULD be consistent. However, that appears to be impossible to achieve, if the internal time zone of a database cannot be detected and the JDBC driver always converts from supposed UTC to local time zone. Apparently MySQL detected the problem themselves and therefore introduced a connection string parameter. You can always use this parameter with SIARD.

HartwigThomas commented 5 years ago

I have thought about the question some more. I remembered that I have a test which proves that MySQL renders the same date which was put into it. So it is not a question of MySQL.

I also agree with you, that a time zone does not make sense for a pure date type, unless the date is associated with some time. So I found the problem in the conversion of the date to an XML string, where the UTC time zone is applied to the date. The conversion back from the string to a java.sql.Date did not reproduce the original date value.

I have therefore decided to drop the conversion to UTC for pure date types which are not associated to a time. The next release will contain that change.

marhop commented 5 years ago

Great to hear that, thank you for the effort!

I see the new release is already available. Just tried it with my data; the problem disappeared as expected.