Open GoogleCodeExporter opened 9 years ago
Note: you can reproduce the same result using tungsten-sandbox with the
fileapplier data source.
Original comment by g.maxia
on 6 Dec 2014 at 4:02
It looks to me as if timestamps are being emitted as GMT, which is correct
behavior if there is no other indication provided. Tungsten does not use the
platform time zone any longer as that leads to inconsistent result when
outputting data.
To alter the CSV time zone try one of the following:
1.) Set a different timezone in the services.properties. This alters the
replicator default time zone and changes all timestamps.
2.) Set the replicator.applier.dbms.timezone to another time zone. This will
change the time zone used for timestamp formatting. This is the recommended
approach, since it does not alter the default time zone of the replicator.
We will likely encounter anomalies even with #2 so the results will be very
interesting. :)
Original comment by robert.h...@continuent.com
on 6 Dec 2014 at 4:26
[deleted comment]
I have tested more, and seen what the problem is.
First of all, there was a configuration issue. I was setting the time zone in
the MySQL slaves, but that does not affect Hadoop.
Anyway, the discrepancy is that in the CSV file there is no notion of time
zones, and thus we get the timezone from MySQL as GMT, without the conversion
that MySQL protocol does automatically when reading values.
I tried modifying the replicator time_zone, but that affects all fields, not
only timestamps.
Original comment by g.maxia
on 7 Dec 2014 at 3:24
The same behaviour is seen when replicating into Vertica and RedShift; what
we're seeing here is the CSV is generated using GMT/UTC, even though we know it
wasn't extracted or written into the master database in that timezone.
We have two choices:
1) Document that CSV appliers all assume that we are writing UTC timestamps,
even if they may be extracted in another timezone.
2) Since this is common to all CSV appliers, I suggest we change the CSV
applier so that it writes the CSV timestamp as the modified timezone value, not
UTC. This way, the CSV will contain the correct date.
(2) Would be my preferred solution, but I'm willing to be convinced otherwise.
Original comment by mc.br...@continuent.com
on 9 Dec 2014 at 3:44
Original comment by linas.vi...@continuent.com
on 11 Dec 2014 at 2:43
Original comment by linas.vi...@continuent.com
on 19 Jan 2015 at 2:18
Original issue reported on code.google.com by
g.maxia
on 6 Dec 2014 at 4:01