Open lpmarques opened 3 years ago
I think that JDBC (which controls the bootstrapping process) is probably adding a timezone to the datetimes returned, which maxwell yeah, then attempts to convert to UTC times.
If you have a surgical fix here to suggest I'll listen, but I've gone so many damn rounds on mysql and its timezone support (and dumbness thereof) that mostly I just throw up my hands and say "Listen. If your server and client aren't in UTC and your datetimes aren't in UTC you're opening up a big can of trouble anyway. I can't even begin to guess what your desired behavior is." And normally I don't do that kind of daddy-engineer-knows-best shit, but in this case... well, I am doing it. sorry.
You might try passing different values of "serverTimezone=SOMETHING" in jdbc_options
. It might help.
So, I get your point, but in my case (and I suppose most data engineers' too) I have no decision power on timezone configuration of transactional source databases.
On the desired behavior of datetime manipulation, maybe I got it wrong (which would be no surprise given mysql's confusing concepts), but my unsderstanding is that mysql datetimes have no timezone data associated to them whatsoever. If that's correct, any datetime value should be read as stored, almost like a simple string would: without any regard to server or client timezones and without any conversion either.
Therefore, if the JDBC is adding timezone to mysql datetimes based on server's global time_zone variable, it sounds like unexpected behavior. But I'll tinker with jdbc_options
and report any effect. Thanks for the suggestion!
Hey @osheroff!
I tried many different things on jdbc_options with no success. When set to UTC, the serverTimezone
parameter you mentioned actually did ensure datetime values were fetched as stored, but had the side-effect of fetching timestamp values with database's timezone, not UTC.
By tinkering with the source code (SynchronousBootstrapper.java, more specifically), I noticed JDBC is indeed responsible for adding timezone to both datetime and timestamp values as you correctly guessed. I'm sure you may come with a more elegant solution, but I created a pull request to avoid the proceeding timezone conversion of datetime values (only). I've run some tests and confirmed the fix works for datetimes of any precision, making bootstrapping and CDC processes coherent.
Please, consider reviewing it for inclusion in Maxwell's next releases.
Thank you!
PS: another thing I noticed is this timezone conversion of both datetimes and timestamps during bootstrapping is not necessarily to UTC, it's actually from the server database to the client host timezone, which I guess can cause differences from the CDC process behavior too. Since at least I can ensure client host timezone to be UTC, I did not tackle this problem in my pull request, but still I thought it was worth mentioning.
Hey there,
Last week I noticed some oddly valued (but still valid) datetime fields in a source table were being reproduced as invalid datetimes by Maxwell's bootstrap process.
The original value stored in the source mysql database for these fields is 9999-12-31 23:59:59, while resulting bootstrap-insert events show them as 10000-01-01 02:59:59, probably over-assuming that: 1 - these values were recorded in the time zone specified by GLOBAL.time_zone mysql variable (-03:00), which is not necessarily the case; 2 - they should be represented as some sort of timestamp in the target, therefore needing to be converted to UTC first, which is not necessarily the desired behavior and is also, afaik, not informed in the documentation.
Later I could confirm this is consistently happening to all datetime values (not only the oddly valued ones) during bootstrap runs, but the ongoing CDC process shows the expected behavior of keeping all datetime values as recorded in the source.
Maxwell version: 1.33.1