is00hcw / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 1 forks source link

Tungsten replicator row performance slowed by unnecessary time zone switches #1090

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Configure a MySQL master DBMS with row replication enabled.  (e.g., run 'set 
global binlog_format=row')
2. Set up Tungsten master/slave replication 
3. On the target slave, enable the MySQL general log, e.g., run 'set global 
general_log=1'.  
4. Tail the slave MySQL server log, using a command like the following: 

tail -f /var/lib/mysql/logos2.log

5. Create a table and run a set of commands like the following on the MySQL 
master:

create database if not exists test;
create table if not exists test.datacheck (
  id int primary key auto_increment, 
  d varchar(30),
  ts timestamp
) charset=utf8;

set session binlog_format=row;
delete from test.datacheck;

insert into test.datacheck(id, d, ts) values(1, 'one', now());
update test.datacheck set d='ONE' where id=1;
insert into test.datacheck(id, d, ts) values(2, 'two', now());
update test.datacheck set d='ONE' where id=2;
insert into test.datacheck(id, d, ts) values(3, 'three', now());
update test.datacheck set d='ONE' where id=3;
delete from test.datacheck where id=1;

6. What the commands the go into the log. 

What is the expected output?

Since the time zone is set to UTC for all row changes, you should not see any 
time zone-related changes.  Instead, you'll insert, update, and delete commands 
like the following: 

          136 Query INSERT INTO `test`.`datacheck` ( `id` , `d` , `ts` )  VALUES (  1  ,  UNHEX( '6f6e65' ) ,  '2015-01-20 00:17:30'  )
          136 Query UPDATE `test`.`datacheck` SET `id` =  1  , `d` =  UNHEX( '4f4e45' )  , `ts` =  '2015-01-20 00:17:30'  WHERE `id` =  1
          136 Query INSERT INTO `test`.`datacheck` ( `id` , `d` , `ts` )  VALUES (  2  ,  UNHEX( '74776f' ),  '2015-01-20 00:17:30'  )

What do you see instead?

There are numerous calls to 'set session time_zone', as in the following 
example:

          127 Query set @@session.time_zone='SYSTEM'
          127 Query set @@session.time_zone='+00:00'
          127 Query INSERT INTO `test`.`datacheck` ( `id` , `d` , `ts` )  VALUES (  1  ,  UNHEX( '6f6e65' )  ,  '2015-01-20 00:15:56'  )
          127 Query UPDATE `test`.`datacheck` SET `id` =  1  , `d` =  UNHEX( '4f4e45' )  , `ts` =  '2015-01-20 00:15:56'  WHERE `id` =  1
          127 Query set @@session.time_zone='SYSTEM'
          127 Query set @@session.time_zone='+00:00'
          127 Query INSERT INTO `test`.`datacheck` ( `id` , `d` , `ts` )  VALUES (  2  ,  UNHEX( '74776f' )  ,  '2015-01-20 00:15:56'  )

What is the possible cause?

The time zone is being set to the client time zone in the session options of 
the DBMSEvent that carries the row changes.  So if you are running as SYSTEM, 
it will have that value.  The time zone is also being set to +00:00 (i.e., UTC) 
in the RowChangeData instance that carries the row changes.  The applier sets 
session variables whenever something changes, resulting in the commands shown 
above. 

This impacts performance because every set session command requires a network 
call. 

The specific location in the code is in 
com.continuent.tungsten.replicator.extractor.mysql.MySQLExtractor.extractEvent(B
inlogReader), line 667, where you have the following: 

                    int operation = sqlOperation.getOperation();
                    if (operation == SqlOperation.BEGIN)
                    {
                        inTransaction = true;
                        doCommit = false;
                        // This a a BEGIN statement : buffer session variables
                        // for following row events if any and skip it

                        /* Adding statement options */
                        savedOptions.add(new ReplOption("autocommit", event
                                .getAutocommitFlag()));
                        <snip>
                        if (event.getTimeZoneName() != null)
                            savedOptions.add(new ReplOption("time_zone", event
                                    .getTimeZoneName()));

What is the proposed solution?

Commenting out the command to add the time_zone session variable to the event 
header seems to do the trick, e.g.: 

                        //if (event.getTimeZoneName() != null)
                        //    savedOptions.add(new ReplOption("time_zone", event
                        //            .getTimeZoneName()));

This may however have side-effects, which I am checking now. 

Additional information

This affects the version 3.0.1 replicator.  

Use labels and text to provide additional information.

Original issue reported on code.google.com by robert.h...@continuent.com on 20 Jan 2015 at 12:43

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 20 Jan 2015 at 9:38

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2772.

Remove extra logging of session time zone that causes performance drop on row 
replication. 

Original comment by robert.h...@continuent.com on 20 Jan 2015 at 6:10

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r2775.

Merge fix to remove extra logging of session time zone that causes performance 
drop on row replication. 

Original comment by robert.h...@continuent.com on 21 Jan 2015 at 10:19