MySQL: 主从原理 - Githubissues

主从用途

Advantages of replication in MySQL include:

Scale-out solutions - spreading the load among multiple replicas to improve performance. In this environment, all writes and updates must take place on the replication source server. Reads, however, may take place on one or more replicas. This model can improve the performance of writes (since the source is dedicated to updates), while dramatically increasing read speed across an increasing number of replicas.
Data security - because data is replicated to the replica, and the replica can pause the replication process, it is possible to run backup services on the replica without corrupting the corresponding source data.
Analytics - live data can be created on the source, while the analysis of the information can take place on the replica without affecting the performance of the source.
Long-distance data distribution - you can use replication to create a local copy of data for a remote site to use, without permanent access to the source.

基于binlog的复制

This section describes replication between MySQL servers based on the binary log file position method, where the MySQL instance operating as the source (where the database changes originate) writes updates and changes as “events” to the binary log. The information in the binary log is stored in different logging formats according to the database changes being recorded. Replicas are configured to read the binary log from the source and to execute the events in the binary log on the replica's local database.

Each replica receives a copy of the entire contents of the binary log. It is the responsibility of the replica to decide which statements in the binary log should be executed. Unless you specify otherwise, all events in the source's binary log are executed on the replica. If required, you can configure the replica to process only events that apply to particular databases or tables.

Each replica keeps a record of the binary log coordinates: the file name and position within the file that it has read and processed from the source. This means that multiple replicas can be connected to the source and executing different parts of the same binary log. Because the replicas control this process, individual replicas can be connected and disconnected from the server without affecting the source's operation. Also, because each replica records the current position within the binary log, it is possible for replicas to be disconnected, reconnect and then resume processing.

The source and each replica must be configured with a unique ID (using the [server_id](https://dev.mysql.com/doc/refman/5.7/en/replication-options.html#sysvar_server_id) system variable). In addition, each replica must be configured with information about the source's host name, log file name, and position within that file. These details can be controlled from within a MySQL session using the CHANGE MASTER TO statement on the replica. The details are stored within the replica's connection metadata repository, which can be either a file or a table (see Section 16.2.4, “Relay Log and Replication Metadata Repositories”).

前置要求

On the source, you must enable binary logging and configure a unique server ID. This might require a server restart. See Section 16.1.2.1, “Setting the Replication Source Configuration”.
On each replica that you want to connect to the source, you must configure a unique server ID. This might require a server restart. See Section 16.1.2.5.1, “Setting the Replica Configuration”.
Optionally, create a separate user for your replicas to use during authentication with the source when reading the binary log for replication. See Section 16.1.2.2, “Creating a User for Replication”.
Before creating a data snapshot or starting the replication process, on the source you should record the current position in the binary log. You need this information when configuring the replica so that the replica knows where in the binary log to start executing events. See Section 16.1.2.3, “Obtaining the Replication Source's Binary Log Coordinates”.
If you already have data on the source and want to use it to synchronize the replica, you need to create a data snapshot to copy the data to the replica. The storage engine you are using has an impact on how you create the snapshot. When you are using MyISAM, you must stop processing statements on the source to obtain a read-lock, then obtain its current binary log coordinates and dump its data, before permitting the source to continue executing statements. If you do not stop the execution of statements, the data dump and the source's status information will not match, resulting in inconsistent or corrupted databases on the replicas. For more information on replicating a MyISAM source, see Section 16.1.2.3, “Obtaining the Replication Source's Binary Log Coordinates”. If you are using InnoDB, you do not need a read-lock and a transaction that is long enough to transfer the data snapshot is sufficient. For more information, see Section 14.20, “InnoDB and MySQL Replication”.
Configure the replica with settings for connecting to the source, such as the host name, login credentials, and binary log file name and position. See Section 16.1.2.5.2, “Setting the Source Configuration on the Replica”.

全新环境创建主从

When there is no snapshot of a previous database to import, configure the replica to start replication from the new source.

To set up replication between a source and a new replica:

Start up the replica and connect to it.
Execute a CHANGE MASTER TO statement to set the source configuration. See Section 16.1.2.5.2, “Setting the Source Configuration on the Replica”.

Perform these setup steps on each replica.

This method can also be used if you are setting up new servers but have an existing dump of the databases from a different server that you want to load into your replication configuration. By loading the data into a new source, the data is automatically replicated to the replicas.

If you are setting up a new replication environment using the data from a different existing database server to create a new source, run the dump file generated from that server on the new source. The database updates are automatically propagated to the replicas:

shell> mysql -h master < fulldb.dump

已有数据的环境创建主从

Setting Up Replication with Existing Data

When setting up replication with existing data, transfer the snapshot from the source to the replica before starting replication. The process for importing data to the replica depends on how you created the snapshot of data on the source.

Choose one of the following:

If you used mysqldump:

Start the replica, using the --skip-slave-start option so that replication does not start.
Import the dump file:
```
shell> mysql < fulldb.dump
```

If you created a snapshot using the raw data files:

Extract the data files into the replica's data directory. For example:
```
shell> tar xvf dbdump.tar
```
You may need to set permissions and ownership on the files so that the replica server can access and modify them.
Start the replica, using the --skip-slave-start option so that replication does not start.
Configure the replica with the replication coordinates from the source. This tells the replica the binary log file and position within the file where replication needs to start. Also, configure the replica with the login credentials and host name of the source. For more information on the CHANGE MASTER TO statement required, see Section 16.1.2.5.2, “Setting the Source Configuration on the Replica”.
Start the replication threads:
```
mysql> START SLAVE;
```

After you have performed this procedure, the replica connects to the source and replicates any updates that have occurred on the source since the snapshot was taken.

If the server_id system variable for the source is not correctly set, replicas cannot connect to it. Similarly, if you have not set server_id correctly for the replica, you get the following error in the replica's error log:

Warning: You should set server-id to a non-0 value if master_host
is set; we will force server id to 2, but this MySQL server will
not act as a slave.

You also find error messages in the replica's error log if it is not able to replicate for any other reason.

The replica stores information about the source you have configured in its connection metadata repository. The connection metadata repository can be in the form of files or a table, as determined by the value set for the master_info_repository system variable. When a replica runs with master_info_repository=FILE, two files are stored in the data directory, named master.info and relay-log.info. If master_info_repository=TABLE instead, this information is saved in the master_slave_info table in the mysql database. In either case, do not remove or edit the files or table. Always use the CHANGE MASTER TO statement to change replication parameters. The replica can use the values specified in the statement to update the status files automatically. See Section 16.2.4, “Relay Log and Replication Metadata Repositories”, for more information.

GTID

GTID (Global Transaction IDentifier) 是全局事务标识。它具有全局唯一性，一个事务对应一个GTID。唯一性不仅限于主服务器，GTID在所有的从服务器上也是唯一的。一个GTID在一个服务器上只执行一次，从而避免重复执行导致数据混乱或主从不一致。

在传统的复制里面，当发生故障需要主从切换时，服务器需要找到binlog和pos点，然后将其设定为新的主节点开启复制。相对来说比较麻烦，也容易出错。在MySQL 5.6里面，MySQL会通过内部机制自动匹配GTID断点，不再寻找binlog和pos点。我们只需要知道主节点的ip，端口，以及账号密码就可以自动复制。

GTID的组成部分：

GDIT由两部分组成：GTID = source_id:transaction_id。其中source_id是产生GTID的服务器，即是server_uuid，在第一次启动时生成（sql/mysqld.cc: generate_server_uuid()），并保存到DATADIR/auto.cnf文件里。transaction_id是序列号（sequence number），在每台MySQL服务器上都是从1开始自增长的顺序号，是事务的唯一标识。例如：3E11FA47-71CA-11E1-9E33-C80AA9429562:23 GTID 的集合是一组GTIDs，可以用source_id+transaction_id范围表示，例如：3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5 复杂一点的：如果这组 GTIDs 来自不同的 source_id，各组 source_id 之间用逗号分隔；如果事务序号有多个范围区间，各组范围之间用冒号分隔，例如：3E11FA47-71CA-11E1-9E33-C80AA9429562:23,3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5

GTID如何产生：

GTID的生成受GTID_NEXT控制。

在主服务器上，GTID_NEXT默认值是AUTOMATIC，即在每次事务提交时自动生成GTID。它从当前已执行的GTID集合（即gtid_executed）中，找一个大于0的未使用的最小值作为下个事务GTID。同时在实际的更新事务记录之前，将GTID写入到binlog（set GTID_NEXT记录）。在Slave上，从binlog先读取到主库的GTID(即get GTID_NEXT记录)，而后执行的事务采用该GTID。

GTID的工作原理：

GTID在所有主从服务器上都是不重复的。所以所有在从服务器上执行的事务都可以在bnlog找到。一旦一个事务提交了，与拥有相同GTID的后续事务都会被忽略。这样可以保证从服务器不会重复执行同一件事务。

当使用GTID时，从服务器不需要保留任何非本地数据。使用数据都可以从replicate data stream。从DBA和开发者的角度看，从服务器无保留file-offset pairs以决定如何处理主从服务器间的数据流。

GTID的生成和使用由以下几步组成：

主服务器更新数据时，会在事务前产生GTID，一同记录到binlog日志中。 binlog传送到从服务器后，被写入到本地的relay log中。从服务器读取GTID，并将其设定为自己的GTID（GTID_NEXT系统）。 sql线程从relay log中获取GTID，然后对比从服务器端的binlog是否有记录。如果有记录，说明该GTID的事务已经执行，从服务器会忽略。如果没有记录，从服务器就会从relay log中执行该GTID的事务，并记录到binlog。

Setting Up Replication Using GTIDs

This section describes a process for configuring and starting GTID-based replication in MySQL 5.7. This is a “cold start” procedure that assumes either that you are starting the replication source server for the first time, or that it is possible to stop it; for information about provisioning replicas using GTIDs from a running source, see Section 16.1.3.5, “Using GTIDs for Failover and Scaleout”. For information about changing GTID mode on servers online, see Section 16.1.4, “Changing Replication Modes on Online Servers”.

The key steps in this startup process for the simplest possible GTID replication topology, consisting of one source and one replica, are as follows:

If replication is already running, synchronize both servers by making them read-only.
Stop both servers.
Restart both servers with GTIDs enabled and the correct options configured.

The mysqld options necessary to start the servers as described are discussed in the example that follows later in this section.
Instruct the replica to use the source as the replication data source and to use auto-positioning. The SQL statements needed to accomplish this step are described in the example that follows later in this section.
Take a new backup. Binary logs containing transactions without GTIDs cannot be used on servers where GTIDs are enabled, so backups taken before this point cannot be used with your new configuration.
Start the replica, then disable read-only mode on both servers, so that they can accept updates.

In the following example, two servers are already running as source and replica, using MySQL's binary log position-based replication protocol. If you are starting with new servers, see Section 16.1.2.2, “Creating a User for Replication” for information about adding a specific user for replication connections and Section 16.1.2.1, “Setting the Replication Source Configuration” for information about setting the server_id variable. The following examples show how to store mysqld startup options in server's option file, see Section 4.2.2.2, “Using Option Files” for more information. Alternatively you can use startup options when running mysqld.

Most of the steps that follow require the use of the MySQL root account or another MySQL user account that has the SUPER privilege. mysqladmin shutdown requires either the SUPER privilege or the SHUTDOWN privilege.

Step 1: Synchronize the servers. This step is only required when working with servers which are already replicating without using GTIDs. For new servers proceed to Step 3. Make the servers read-only by setting the read_only system variable to ON on each server by issuing the following:

mysql> SET @@GLOBAL.read_only = ON;

Wait for all ongoing transactions to commit or roll back. Then, allow the replica to catch up with the source. It is extremely important that you make sure the replica has processed all updates before continuing.

If you use binary logs for anything other than replication, for example to do point in time backup and restore, wait until you do not need the old binary logs containing transactions without GTIDs. Ideally, wait for the server to purge all binary logs, and wait for any existing backup to expire.

Important

It is important to understand that logs containing transactions without GTIDs cannot be used on servers where GTIDs are enabled. Before proceeding, you must be sure that transactions without GTIDs do not exist anywhere in the topology.

Step 2: Stop both servers. Stop each server using mysqladmin as shown here, where username is the user name for a MySQL user having sufficient privileges to shut down the server:

shell> mysqladmin -uusername -p shutdown

Then supply this user's password at the prompt.

Step 3: Start both servers with GTIDs enabled. To enable GTID-based replication, each server must be started with GTID mode enabled by setting the gtid_mode variable to ON, and with the enforce_gtid_consistency variable enabled to ensure that only statements which are safe for GTID-based replication are logged. For example:

gtid_mode=ON
enforce-gtid-consistency=ON

In addition, you should start replicas with the --skip-slave-start option before configuring the replica settings. For more information on GTID related options and variables, see Section 16.1.6.5, “Global Transaction ID System Variables”.

It is not mandatory to have binary logging enabled in order to use GTIDs when using the mysql.gtid_executed Table. Replication source server must always have binary logging enabled in order to be able to replicate. However, replica servers can use GTIDs but without binary logging. If you need to disable binary logging on a replica, you can do this by specifying the --skip-log-bin and --log-slave-updates=OFF options for the replica.

Step 4: Configure the replica to use GTID-based auto-positioning. Tell the replica to use the source with GTID based transactions as the replication data source, and to use GTID-based auto-positioning rather than file-based positioning. Issue a CHANGE MASTER TO statement on the replica, including the MASTER_AUTO_POSITION option in the statement to tell the replica that the source's transactions are identified by GTIDs.

You may also need to supply appropriate values for the source's host name and port number as well as the user name and password for a replication user account which can be used by the replica to connect to the source; if these have already been set prior to Step 1 and no further changes need to be made, the corresponding options can safely be omitted from the statement shown here.

mysql> CHANGE MASTER TO
     >     MASTER_HOST = host,
     >     MASTER_PORT = port,
     >     MASTER_USER = user,
     >     MASTER_PASSWORD = password,
     >     MASTER_AUTO_POSITION = 1;

Neither the MASTER_LOG_FILE option nor the MASTER_LOG_POS option may be used with MASTER_AUTO_POSITION set equal to 1. Attempting to do so causes the CHANGE MASTER TO statement to fail with an error.

Step 5: Take a new backup. Existing backups that were made before you enabled GTIDs can no longer be used on these servers now that you have enabled GTIDs. Take a new backup at this point, so that you are not left without a usable backup.

For instance, you can execute FLUSH LOGS on the server where you are taking backups. Then either explicitly take a backup or wait for the next iteration of any periodic backup routine you may have set up.

Step 6: Start the replica and disable read-only mode. Start the replica like this:

mysql> START SLAVE;

The following step is only necessary if you configured a server to be read-only in Step 1. To allow the server to begin accepting updates again, issue the following statement:

mysql> SET @@GLOBAL.read_only = OFF;

GTID-based replication should now be running, and you can begin (or resume) activity on the source as before. Section 16.1.3.5, “Using GTIDs for Failover and Scaleout”, discusses creation of new replicas when using GTIDs.

lihongjie0209 / myblog

MySQL: 主从原理 #214