zkfan / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 0 forks source link

Enable seamless migration from and to MySQL replication #151

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1. To which tool/application/daemon will this feature apply?

Tungsten Replicator. 

2. Describe the feature in general

As a result of recent implementation work, Tungsten can automatically take over 
from the MySQL SQL thread when using SQL thread takeover.  For the full slave 
case there are still manual steps in the migration to/from Tungsten. This will 
be extended as follows to ensure fully seamless migration of an operating MySQL 
slave to Tungsten Replicator and back again.  

a. When going online, Tungsten Replicator will optionally run a SHOW SLAVE 
STATUS command to see if the MySQL server is configured as a replication slave. 
 If so, Tungsten will stop the slave, note the master binlog filename and 
offset, then begin replicating from this position. 

b. On clean offline (i.e., when all channels are serialized), Tungsten 
Replicator will optionally note the current binlog filename and issue a CHANGE 
MASTER command setting the current master coordinates.  Users can issue a START 
SLAVE command and continue with normal MySQL replication.  

c. The preceding behaviors will be controlled with a configuration settings, as 
there are cases where it is not desirable to have automatic migration.  The 
default will be not to migrate.  

3. Describe the feature interface

This behavior will be controlled by a property setting on the MySQL applier in 
the replication service property file.  Also, we will need to expose it as an 
option in tungsten-installer.  

4. Give an idea (if applicable) of a possible implementation

To be determined.

5. Describe pros and cons of this feature.

5a. Why the world will be a better place with this feature.

Enables fast migration from/to MySQL replication, which helps users and also 
makes comparative testing easier. 

5b. What hardship will the human race have to endure if this feature is
implemented.

Trouble to implement the feature, which is not great.

6. Notes

Original issue reported on code.google.com by berkeley...@gmail.com on 3 Jul 2011 at 11:27

GoogleCodeExporter commented 9 years ago
I am starting implementation now.  Here are a few notes. 

1.) The simplest way to issue a CHANGE MASTER command is to make this a 
Database operation that is supported for MySQL.  For example, we can have an 
"enableNativeReplication() call that accepts an event ID and call it when 
releasing the TrepCommitSeqno instance used to update trep_commit_seqno.  

2.) To make this work simply we need to start storing the full reference to the 
MySQL binlog and offset.  For reasons that are no longer clear to me we store 
only the number in the binlog file name, not the full name.  

3.) The code to stop the slave and start at the current master position is 
relatively straightforward.  The MySQLExtractor.positionBinlogSlave() method 
does close to what we want already, though it picks up the relay log position, 
not the position on the master.  

Original comment by berkeley...@gmail.com on 4 Jul 2011 at 6:37

GoogleCodeExporter commented 9 years ago
Here is a brief description of the design for this feature.  

1.) User interface.  

There will be a new property to designate that a replicator is acting as a 
replication for a native replication slave.  When set to true, this property 
enables seemless migration from/to a native mechanism like MySQL.  

replicator.nativeSlaveTakeover={true|false}

This should be enabled in tungsten-installer as --native-slave-takeover.  It is 
allowed for --direct topologies only at this time. 

2.) The implementation will work as follows.  

2.1) Extractor behavior.  When native-slave-takeover is enabled, the 
MySQLExtractor will expect to position at start-up using master coordinates 
from the current slave if no other coordinates are supplied.  The following 
pseudo-code shows the case logic. 

If there is no restart point
  // No restart point yet; we have to find one.  
  If native-slave-takeover == true
    Turn off slave.
    Start from slave position.
  else
    // This is current behavior. 
    Start from current position in master (SHOW MASTER STATUS)
  end
else
  // There is a restart point from trep_commit_seqno or -from-event option.
  If MySQL Slave is running:  
    Error!!!
  end
end

2.2 Trep_commit_seqno management. 

If native-slave-takeover is enabled, we will issue command(s) following clean 
pipeline shutdown to set the last commit Tungsten coordinates so that native 
slave operation can resume seamlessly.  Here is pseudo-code for MySQL: 

If pipeline is shutdown cleanly
  If native-slave-takeover is enabled
    Extract binlog file and position from last committed event ID. 
    CHANGE MASTER TO master_log_file = <binlog_file>, master_log_pos = <binlog_pos>
  end
end

Tungsten will print a warning if this command fails but will not cause pipeline 
shutdown to fail. 

2.3 Testing

You can show the preceding implementation is correct with the following test 
case: 

a.) Set up native MySQL replication from master_host X on slave_host Y. 
b.) Start a transaction load on X and ensure native replication to Y is 
working. 
c.) Install Tungsten on slave host Y.  Specify --direct and 
--native-slave-takeover. 
d.) Issue 'trepctl -host Y online' to bring Tungsten online. 
e.) Verify that MySQL slave is stopped and Tungsten replication is operating on 
Y. 
f.) Issue 'trepctl -host Y offline' to stop Tungsten cleanly on Y. 
g.) Remove THL files and tungsten_<svc> schema. 
h.) Issue 'START SLAVE' on host Y. 
i.) Verify that native replication is working again and transferring data. 

From this point on you should be able to issue the command sequence d-i again 
as many times you like with or without load.  If this succeeds the feature 
works. 

(Accidentally added this to Issue 152.) 

Original comment by berkeley...@gmail.com on 5 Jul 2011 at 9:07

GoogleCodeExporter commented 9 years ago
Implementation of this feature is incomplete.
When "--native-slave-takeover" is given to the installer, validation of 
existing replication slave should be suspended, as the purpose of this feature 
is to take over from an existing slave. 

Original comment by g.maxia on 6 Jul 2011 at 9:04

GoogleCodeExporter commented 9 years ago
I've updated the check so that it is disabled if the slave has native takeover 
enabled.

Original comment by jeffm...@gmail.com on 6 Jul 2011 at 12:54