kagkarlsson / db-scheduler

Persistent cluster-friendly scheduler for Java
Apache License 2.0
1.23k stars 188 forks source link

Use DATETIME instead of TIMESTAMP for MySQL schema #514

Open xak2000 opened 1 month ago

xak2000 commented 1 month ago

Expected Behavior

All timestamp columns should have type DATETIME for MySQL reference definition.

Current Behavior

Currently in the MySQL reference definition all timestamp columns have TIMESTAMP type.

Related comment where @kagkarlsson explains why TIMESTAMP type was selected instead of DATETIME.

I'd like to explain why this is the wrong type for the MySQL definition in my opinion and why DATETIME should still be preferred.

I agree with the statement in the mentioned comment that TIMESTAMP is more directly mappable to java.time.Instant while DATETIME is a sort of java.time.LocalDateTime. Both do not contain a timezone, though.

But TIMESTAMP in MySQL is not direct replacement for TIMESTAMP in Posgresql. Unlike Posgresql, in MySQL TIMESTAMP has many restrictions and quirks.

For instance:

Based on my practice, the most used MySQL data type for timestamps is DATETIME. The TIMESTAMP type is too strange and quirky and usually avoided by people who uses MySQL. (unlike Postgresql's TIMESTAMP, which is totally fine).

About the fact that DATETIME doesn't have a timezone. It is true, so the best practice is to convert any timestamps into UTC timezone before storing them into DATETIME column. I think the alwaysPersistTimestampInUtc flag should cover this, right?

Btw, I think MySQL JDBC driver actually manipulates timezones based on the server time zone and the JVM timezone (or passed Calendar instance) for both DATETIME and TIMESTAMP column types. The difference is just the internal representation of the data: TIMESTAMP is storead as a number (since epoch) on the mysql server (4 bytes), while DATETIME is stored in a more complex data format (8 bytes). But TIMESTAMPs timezone is still manipulated when it is read/written. So, it is not better than DATETIME for mapping java.time.Instant. In both cases JDBC driver first formats the Instant into string in the default (or specified) JVM timezone and then into Mysql session's timezone. After that DATETIME is stored as is, while TIMESTAMP is additionally converted from the session's timezone to UTC.

A quote from the docs:

MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server's time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable. For more information, see Section 5.1.13, “MySQL Server Time Zone Support”.

P.S. I'm a new user of this library so I didn't intensively tested it with DATETIME columns yet. But I already see these problems with TIMESTAMP column type. I didn't noticed any problems when changed type of all TIMESTAMP columns to DATETIME in scheduled_tasks table, but I should mention that both my MySQL server and JVM are configured to have UTC timezone, so I could probably miss some problems related to timezone conversions done by MySQL JDBC driver. So, a more throuthful testing is required to make sure DATETIME columns will always work as expected.

But, based on what I see in DefaultJdbcCustomization the way Instant is get/set when persistTimestampInUTC is enabled is exactly the same as I usually do in all projects that work with MySQL DATETIME columns. So, I don't see why DATETIME instead of TIMESTAMP may not work as expected. And when persistTimestampInUTC is disabled, then the problem with wrong stored timestamps (when the session timezone is changed) arises for both DATETIME and TIMESTAMP column types. That is why the flag was introduced in the first place, if I understand correctly. :)

Context