Open annProg opened 4 years ago
手动备份警告信息:
2020-07-02 13:07:32 | Info | mysqldump said: Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events. | IssueLog
会有节点一直在维护状态页面
Update:此问题应是 rsync 同步导致,node1 上执行维护任务,进入维护状态,node2, node3 恰好开始同步,node1 维护完成之后删除 data/.maintenance
,但是 node2, node3 没有删除(不能使用 rsync --delete
选项,因为每个节点有不同的cache等文件,此选项会删除这些不同的文件)。应将此文件排除在 rsync 同步列表之外。
Run:
rm -f data/.maintenance
Update: 会有2个节点(102,103)经常进入 maintenance 状态。
恢复备份失败,报错
@@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
解决备份还原报错问题
setup/backup.class.inc.php
增加 --set-gtid-purged=OFF
参数
// Note: opt implicitely sets lock-tables... which cancels the benefit of single-transaction!
// skip-lock-tables compensates and allows for writes during a backup
$sCommand = "$sMySQLDump --defaults-extra-file=\"$sMySQLDumpCnfFile\" --opt --skip-lock-tables --default-character-set=".$sMysqldumpCharset." --add-drop-database --single-transaction --host=$sHost $sPortOption --user=$sUser $sTlsOptions --set-gtid-purged=OFF --result-file=$sTmpFileName $sDBName $sTables 2>&1";
$sCommandDisplay = "$sMySQLDump --defaults-extra-file=\"$sMySQLDumpCnfFile\" --opt --skip-lock-tables --default-character-set=".$sMysqldumpCharset." --add-drop-database --single-transaction --host=$sHost $sPortOption --user=xxxxx $sTlsOptions --set-gtid-purged=OFF --result-file=$sTmpFileName $sDBName $sTables";
# find ./ -name "*.php" |xargs grep "GET_LOCK" --color
./core/mutex.class.inc.php: // using GET_LOCK anytime on the same session will RELEASE the current and unique session lock (known issue)
./core/mutex.class.inc.php: $res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 3600)");
./core/mutex.class.inc.php: $res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 0)");
./core/mutex.class.inc.php: $sMsg = 'GET_LOCK('.$this->sName.', 0) returned: '.var_export($res, true).'. Expected values are: 0, 1 or null';
./lib/symfony/http-foundation/Session/Storage/Handler/PdoSessionHandler.php: $stmt = $this->pdo->prepare('SELECT GET_LOCK(:key, 50)');
core/mutex.class.inc.php 中关于 GET_LOCK
和 RELEASE_LOCK
的函数
/**
* Class iTopMutex
* A class to serialize the execution of some code sections
* Emulates the API of PECL Mutex class
* Relies on MySQL locks because the API sem_get is not always present in the
* installed PHP.
*
* @copyright Copyright (C) 2013-2018 Combodo SARL
* @license http://opensource.org/licenses/AGPL-3.0
*/
class iTopMutex
{
...
/**
* Acquire the mutex. Uses a MySQL lock. <b>Warn</b> : can have an abnormal behavior on MySQL clusters (see R-016204)
*
* @see https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock
*/
public function Lock()
{
if ($this->bLocked)
{
// Lock already acquired
return;
}
if (self::$aAcquiredLocks[$this->sName] == 0)
{
do
{
$res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 3600)");
if (is_null($res))
{
throw new Exception("Failed to acquire the lock '".$this->sName."'");
}
// $res === '1' means I hold the lock
// $res === '0' means it timed out
}
while ($res !== '1');
}
$this->bLocked = true;
self::$aAcquiredLocks[$this->sName]++;
}
/**
* Attempt to acquire the mutex
* @returns bool True if the mutex is acquired, false if already locked elsewhere
*/
public function TryLock()
{
if ($this->bLocked)
{
return true; // Already acquired
}
if (self::$aAcquiredLocks[$this->sName] > 0)
{
self::$aAcquiredLocks[$this->sName]++;
$this->bLocked = true;
return true;
}
$res = $this->QueryToScalar("SELECT GET_LOCK('".$this->sName."', 0)");
if (is_null($res))
{
throw new Exception("Failed to acquire the lock '".$this->sName."'");
}
// $res === '1' means I hold the lock
// $res === '0' means it timed out
if ($res === '1')
{
$this->bLocked = true;
self::$aAcquiredLocks[$this->sName]++;
}
if (($res !== '1') && ($res !== '0'))
{
$sMsg = 'GET_LOCK('.$this->sName.', 0) returned: '.var_export($res, true).'. Expected values are: 0, 1 or null';
IssueLog::Error($sMsg);
throw new Exception($sMsg);
}
return ($res !== '0');
}
/**
* Release the mutex
*/
public function Unlock()
{
if (!$this->bLocked)
{
// ??? the lock is not acquired, exit
return;
}
if (self::$aAcquiredLocks[$this->sName] == 0)
{
return; // Safety net
}
if (self::$aAcquiredLocks[$this->sName] == 1)
{
$res = $this->QueryToScalar("SELECT RELEASE_LOCK('".$this->sName."')");
}
$this->bLocked = false;
self::$aAcquiredLocks[$this->sName]--;
}
分别在 1,2 号节点部署 cron 任务,1号2号都可能有报错日志:
ERROR: 'Trying to commit transaction whereas none have been started !: mysql_errno = 3101, mysql_error = Plugin instructed the server to rollback the current transaction.'
Please note, that in general this error is an expected behavior of Group Replication plugin, when it finds conflicting transactions on different nodes, if Multi Primary mode is used. Application should be able to handle this type of errors based on application logic. The following sections can only applied in those cases, when this error is caused by multiple updates on the same node.
参考: https://support.oracle.com/knowledge/Oracle%20Database%20Products/2211042_1.html
可能最好只在一个节点上运行 cron
用到 Lock
的代码
./core/counter.class.inc.php: $oiTopMutex->Lock();
./core/log.class.inc.php: $oLock->Lock();
./core/ormlinkset.class.inc.php: $oMtx->Lock();
./core/ownershiplock.class.inc.php: $oMutex->Lock();
./core/ownershiplock.class.inc.php: $oMutex->Lock();
./core/ownershiplock.class.inc.php: $oMutex->Lock();
./core/ownershiplock.class.inc.php: $oMutex->Lock();
./core/ownershiplock.class.inc.php: $oMutex->Lock();
./datamodels/2.x/itop-backup/ajax.backup.php: $oRestoreMutex->Lock();
./datamodels/2.x/itop-backup/main.itop-backup.php: $oMutex->Lock();
./datamodels/2.x/itop-core-update/src/Service/CoreUpdater.php: $oMutex->Lock();
./datamodels/2.x/itop-hub-connector/ajax.php: $oMutex->Lock();
./synchro/synchro_import.php: $oMutex->Lock();
./synchro/synchrodatasource.class.inc.php: $oMutex->Lock();
# find ./ -name "*.php" |xargs grep --color "IncClass("
./core/counter.class.inc.php: public static function IncClass($sLeafClass)
./core/metamodel.class.php: return ItopCounter::IncClass($sClass);
./env-production/itop-tickets/model.itop-tickets.php: $iNextId = ItopCounter::IncClass(get_class($this));
MetaModel::GetNextKey() iTopCounter::IncClass() iTopCounter::Inc()
弃用的函数,2.7 版本未使用此函数。但是搜不到 ItopCounter::incRootClass
函数,怀疑可能就是 ItopCounter::IncClass
/**
* @deprecated 2.7.0 N°1627, use ItopCounter::incRootClass($sClass) instead
*
* @param string $sClass
*
* @return int
* @throws \CoreException
*/
public static function GetNextKey($sClass)
{
return ItopCounter::IncClass($sClass);
}
DBInsertNoReload() MakeTicketRef() ItopCounter::IncClass() iTopCounter::Inc()
ref 指工单编号,因此多主集群可能导致生成的 工单编号 错误。
public function MakeTicketRef()
{
$iNextId = ItopCounter::IncClass(get_class($this));
$sRef = $this->MakeTicketRef($iNextId);
$this->SetIfNull('ref', $sRef);
$iKey = parent::DBInsertNoReload();
return $iKey;
}
protected function MakeTicketRef($iNextId)
{
return sprintf(static::GetTicketRefFormat(), $iNextId);
}
public static function GetTicketRefFormat()
{
return 'T-%06d';
}
分别使用 csv 导入 和 api 来写入 100 个工单,检查 ref 生成是否重复或者缺失
实验组:mgr集群单实例操作 对照组:非mgr集群
结论:未发现异常,均生成连续不重复的 ref
实验组:mgr集群多实例同时操作 对照组:非mgr集群多浏览器窗口同时操作
实验组出现重复 ref,且最终导入了166个工单,对照组生成100个连续且不重复的 ref
1 R-000261
1 R-000262
1 R-000263
1 R-000264
1 编号
2 R-000127
2 R-000154
2 R-000157
2 R-000189
2 R-000243
脚本
#!/bin/bash
[ $# -lt 2 ] && echo "$0 password url" && exit 1
user=admin
password=$1
url=$2
json_data='{"operation":"core/create","comment":"test mgr","class":"UserRequest","output_fields":"id,friendlyname","fields":{"org_id":"SELECT Organization WHERE name = \"Demo\"","caller_id":{"name":"Christie", "first_name":"Agatha"},"title":"Test MGR From API", "description":"Test"}}'
curl -s "$url/webservices/rest.php?version=1.3" -d "auth_user=$user&auth_pwd=$password&json_data=$json_data"
同时调用3个节点的API
#!/bin/bash
for id in `seq 1 100`;do echo $id;done |parallel -j 3 ./ticket-api.sh admin http://192.168.10.101 &
for id in `seq 1 100`;do echo $id;done |parallel -j 3 ./ticket-api.sh admin http://192.168.10.102 &
for id in `seq 1 100`;do echo $id;done |parallel -j 3 ./ticket-api.sh admin http://192.168.10.103 &
结果:有重复编号
1 R-000743
1 R-000744
1 R-000745
1 R-000746
1 R-000747
1 编号
2 R-000574
2 R-000585
2 R-000591
2 R-000594
2 R-000595
2 R-000599
2 R-000603
2 R-000612
2 R-000614
2 R-000615
2 R-000638
2 R-000646
2 R-000655
2 R-000665
2 R-000667
2 R-000668
2 R-000669
2 R-000673
2 R-000674
2 R-000676
2 R-000677
2 R-000681
2 R-000683
2 R-000684
2 R-000686
3 R-000596
3 R-000685
4 R-000593
4 R-000598
5 R-000597
/**
* Rotate current log file
*
* @param DateTime $oLogFileLastModified date when the log file was last modified
*
* @uses \iTopMutex instead of flock as doing a rename on a file with a flock cause an error on PHP 5.6.40 Windows (ok on 7.3.15 though)
* @uses GetRotatedFileName to get rotated file name
*/
protected function RotateLogFile($oLogFileLastModified)
{
if (!$this->IsLogFileExists()) // extra check, but useful for cron also !
{
return;
}
$oLock = null;
try
{
$oLock = new iTopMutex('log_rotation_'.$this->sLogFileFullPath);
$oLock->Lock();
if (!$this->IsLogFileExists()) // extra extra check if we were blocked and another process moved the file in the meantime
{
$oLock->Unlock();
return;
}
$this->ResetLastModifiedDateForFile();
$sNewLogFileName = $this->GetRotatedFileName($oLogFileLastModified);
rename($this->sLogFileFullPath, $sNewLogFileName);
}
catch (Exception $e)
{
// nothing to do, cannot log... file will be renamed on the next call O:)
return;
}
finally
{
if (!is_null($oLock)) { $oLock->Unlock();}
}
}
CheckAndRotateLogFile() RotateLogFile() iTopMutex::Lock()
暂无结论
cmdbAbstractObject::DBInsertNoReload and cmdbAbstractObject::DBUpdate DBObject::DBInsertNoReload() and DBObject::DBUpdate() DBObject::DBWriteLinks() OrmLinkSet::DBWrite() iTopMutex::Lock()
public function DBWrite(DBObject $oHostObject)
{
...
// Critical section : serialize any write access to these links
//
$oMtx = new iTopMutex('Write-'.$this->sClass);
$oMtx->Lock();
DBObject:DBWriteLinks()
/**
* used both by insert/update
*
* @internal
*
* @throws \CoreException
*/
private function DBWriteLinks()
{
foreach(MetaModel::ListAttributeDefs(get_class($this)) as $sAttCode => $oAttDef)
{
if (!$oAttDef->IsLinkSet()) continue;
if (!array_key_exists($sAttCode, $this->m_aTouchedAtt)) continue;
if (array_key_exists($sAttCode, $this->m_aModifiedAtt) && ($this->m_aModifiedAtt[$sAttCode] == false)) continue;
/** @var \ormLinkSet $oLinkSet */
$oLinkSet = $this->m_aCurrValues[$sAttCode];
$oLinkSet->DBWrite($this);
}
}
影响范围很大。
测试思路:尝试写 xx_list
属性是否有异常
#!/bin/bash
[ $# -lt 2 ] && echo "$0 password url" && exit 1
user=admin
password=$1
url=$2
json_data='{"operation":"core/update","comment":"test mgr","class":"Person","key":"SELECT Person WHERE name=\"Xing\" AND first_name=\"Ming\"","output_fields":"id,team_list,friendlyname","fields":{"team_list":[{"team_id":{"name":"Helpdesk"}}]}}'
curl -s "$url/webservices/rest.php?version=1.3" -d "auth_user=$user&auth_pwd=$password&json_data=$json_data" |jq .
#!/bin/bash
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./ormlinkset.sh admin http://192.168.10.101 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./ormlinkset.sh admin http://192.168.10.102 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./ormlinkset.sh admin http://192.168.10.103 &
出现重复创建的 lnkPersonToTeam
for id in `seq 1 30`;do echo $id;done |parallel -j 20 ./ormlinkset.sh admin http://192.168.10.101
也出现创建 2个 lnk 的情况
也出现创建 2 个 lnk 的情况。
也出现了 2 个 lnk 的情况
可能是测试 case 不对?
/**
* Mechanism to obtain an exclusive lock while editing an object
*
* @package iTopORM
*/
/**
* Persistent storage (in the database) for remembering that an object is locked
*/
class iTopOwnershipToken extends DBObject
{
...
}
/**
* Utility class to acquire/extend/release/kill an exclusive lock on a given persistent object,
* for example to prevent concurrent edition of the same object.
* Each lock has an expiration delay of 120 seconds (tunable via the configuration parameter 'concurrent_lock_expiration_delay')
* A watchdog (called twice during this delay) is in charge of keeping the lock "alive" while an object is being edited.
*/
class iTopOwnershipLock
{
application/cmdbabstract.class.php 中
public function DisplayModifyForm(WebPage $oPage, $aExtraParams = array())
{
$sOwnershipToken = null;
$iKey = $this->GetKey();
$sClass = get_class($this);
$sMode = ($iKey > 0) ? static::ENUM_OBJECT_MODE_EDIT : static::ENUM_OBJECT_MODE_CREATE;
if ($sMode === static::ENUM_OBJECT_MODE_EDIT)
{
// The concurrent access lock makes sense only for already existing objects
$LockEnabled = MetaModel::GetConfig()->Get('concurrent_lock_enabled');
if ($LockEnabled)
{
$sOwnershipToken = utils::ReadPostedParam('ownership_token', null, 'raw_data');
if ($sOwnershipToken !== null)
{
// We're probably inside something like "apply_modify" where the validation failed and we must prompt the user again to edit the object
// let's extend our lock
}
else
{
$aLockInfo = iTopOwnershipLock::AcquireLock($sClass, $iKey);
if ($aLockInfo['success'])
{
$sOwnershipToken = $aLockInfo['token'];
}
else
{
// If the object is locked by the current user, it's worth trying again, since
// the lock may be released by 'onunload' which is called AFTER loading the current page.
//$bTryAgain = $oOwner->GetKey() == UserRights::GetUserId();
self::ReloadAndDisplay($oPage, $this, array('operation' => 'modify'));
return;
}
}
}
}
注意到代码中检查配置项 concurrent_lock_enabled
$LockEnabled = MetaModel::GetConfig()->Get('concurrent_lock_enabled');
因此,这似乎是一个可选项,而且默认是关闭的,没有应该也没啥关系。
存在的问题: 只有在同一节点上才能正常锁住对象,不同节点编辑同一对象,没有提示信息。也就是说多 master 的集群无法使用 Locking to prevent concurrent modifications 的功能
ref: https://www.itophub.io/wiki/page?id=2_2_0%3Aadmin%3Alocking
class BackupExec extends AbstractWeeklyScheduledProcess
{
...
/**
* @param int $iUnixTimeLimit
* @return string
* @throws Exception
*/
public function Process($iUnixTimeLimit)
{
$oMutex = new iTopMutex('backup.'.utils::GetCurrentEnvironment());
$oMutex->Lock();
in ajax.backup.php
,数据恢复时也需要 Lock
/*
* We can't call \LoginWebPage::DoLogin because DBRestore will do a compile after restoring the DB
* Authentication is checked with a token file (see $sOperation='restore_get_token')
*/
case 'restore_exec':
require_once(APPROOT."setup/runtimeenv.class.inc.php");
require_once(APPROOT.'/application/utils.inc.php');
require_once(APPROOT.'/setup/backup.class.inc.php');
require_once(dirname(__FILE__).'/dbrestore.class.inc.php');
IssueLog::Enable(APPROOT.'log/error.log');
$oPage = new ajax_page("");
$oPage->no_cache();
$oPage->SetContentType('text/html');
if (utils::GetConfig()->Get('demo_mode'))
{
$oPage->add("<div data-error-stimulus=\"Error\">Sorry, iTop is in <b>demonstration mode</b>: the feature is disabled.</div>");
}
else
{
$sEnvironment = utils::ReadParam('environment', 'production', false, 'raw_data');
$oRestoreMutex = new iTopMutex('restore.'.$sEnvironment);
IssueLog::Info("Backup Restore - Acquiring the LOCK 'restore.$sEnvironment'");
$oRestoreMutex->Lock();
IssueLog::Info('Backup Restore - LOCK acquired, executing...');
try
{
CoreUpdater::DoBackup
/**
*
* @param string $sTargetFile
*
* @throws Exception
*/
private static function DoBackup($sTargetFile)
{
// Make sure the target directory exists
$sBackupDir = dirname($sTargetFile);
SetupUtils::builddir($sBackupDir);
$oBackup = new DBBackup();
$oBackup->SetMySQLBinDir(MetaModel::GetConfig()->GetModuleSetting('itop-backup', 'mysql_bindir', ''));
$oMutex = new iTopMutex('backup.'.utils::GetCurrentEnvironment());
$oMutex->Lock();
try
{
$oBackup->CreateCompressedBackup($sTargetFile);
SetupLog::Info('itop-core-update: Backup done: '.$sTargetFile);
} catch (Exception $e)
{
$oMutex->Unlock();
throw $e;
}
$oMutex->Unlock();
}
备份相关
/**
*
* @param string $sTargetFile
* @throws Exception
* @return DBBackupWithErrorReporting
*/
function DoBackup($sTargetFile)
{
// Make sure the target directory exists
$sBackupDir = dirname($sTargetFile);
SetupUtils::builddir($sBackupDir);
$oBackup = new DBBackupWithErrorReporting();
$oBackup->SetMySQLBinDir(MetaModel::GetConfig()->GetModuleSetting('itop-backup', 'mysql_bindir', ''));
$sSourceConfigFile = APPCONF.utils::GetCurrentEnvironment().'/'.ITOP_CONFIG_FILE;
$oMutex = new iTopMutex('backup.'.utils::GetCurrentEnvironment());
$oMutex->Lock();
try
{
$oBackup->CreateCompressedBackup($sTargetFile, $sSourceConfigFile);
}
catch (Exception $e)
{
$oMutex->Unlock();
throw $e;
}
$oMutex->Unlock();
return $oBackup;
}
synchro_import.php
$iPreviousTimeLimit = ini_get('max_execution_time');
$iLoopTimeLimit = MetaModel::GetConfig()->Get('max_execution_time_per_loop');
$oMutex = new iTopMutex('synchro_import_'.$oDataSource->GetKey());
$oMutex->Lock();
set_time_limit($iLoopTimeLimit);
foreach ($aData as $iRow => $aRow)
{
$sReconciliationCondition = '`primary_key` = '.CMDBSource::Quote($aRow[$iPrimaryKeyCol]);
synchrodatasource.class.inc.php
class SynchroExecution
{
...
/**
* Perform a synchronization between the data stored in the replicas (synchro_data_xxx_xx table)
* and the iTop objects.
*
* @return SynchroLog
* @throws \CoreUnexpectedValue
*/
public function Process()
{
$this->PrepareLogs();
self::$m_oCurrentTask = $this->m_oDataSource;
$oMutex = new iTopMutex('synchro_process_'.$this->m_oDataSource->GetKey());
try
{
$oMutex->Lock();
$this->DoSynchronize();
$oMutex->Unlock();
最好使用 MGR Single-Primary 模式,用 msyql-shell
把集群切换到 SinglePrimary 模式
var c = dba.getCluster()
c.switchToSinglePrimaryMode()
初始化 mysqlrouter
# mysqlrouter --bootstrap root:root@192.168.10.101:3306 --user=mysqlrouter
# Bootstrapping system MySQL Router instance...
- Creating account(s) (only those that are needed, if any)
- Verifying account (using it to run SQL queries that would be run by Router)
- Storing account in keyring
- Adjusting permissions of generated files
- Creating configuration /etc/mysqlrouter/mysqlrouter.conf
Existing configuration backed up to '/etc/mysqlrouter/mysqlrouter.conf.bak'
# MySQL Router configured for the InnoDB Cluster 'itopMgr'
After this MySQL Router has been started with the generated configuration
$ /etc/init.d/mysqlrouter restart
or
$ systemctl start mysqlrouter
or
$ mysqlrouter -c /etc/mysqlrouter/mysqlrouter.conf
the cluster 'itopMgr' can be reached by connecting to:
## MySQL Classic protocol
- Read/Write Connections: localhost:6446
- Read/Only Connections: localhost:6447
## MySQL X protocol
- Read/Write Connections: localhost:64460
- Read/Only Connections: localhost:64470
按照提示连接 mysqlrouter
附自动生成的配置文件 /etc/mysqlrouter/mysqlrouter.conf
# File automatically generated during MySQL Router bootstrap
[DEFAULT]
name=system
user=mysqlrouter
keyring_path=/var/lib/mysqlrouter/keyring
master_key_path=/etc/mysqlrouter/mysqlrouter.key
connect_timeout=15
read_timeout=30
dynamic_state=/var/lib/mysqlrouter/state.json
[logger]
level = INFO
[metadata_cache:itopMgr]
cluster_type=gr
router_id=1
user=mysql_router1_vf4ftydo7wb1
metadata_cluster=itopMgr
ttl=0.5
auth_cache_ttl=-1
auth_cache_refresh_interval=2
use_gr_notifications=0
[routing:itopMgr_rw]
bind_address=0.0.0.0
bind_port=6446
destinations=metadata-cache://itopMgr/?role=PRIMARY
routing_strategy=first-available
protocol=classic
[routing:itopMgr_ro]
bind_address=0.0.0.0
bind_port=6447
destinations=metadata-cache://itopMgr/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=classic
[routing:itopMgr_x_rw]
bind_address=0.0.0.0
bind_port=64460
destinations=metadata-cache://itopMgr/?role=PRIMARY
routing_strategy=first-available
protocol=x
[routing:itopMgr_x_ro]
bind_address=0.0.0.0
bind_port=64470
destinations=metadata-cache://itopMgr/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=x
将 bootstrap 节点上的 mysqlrouter 的配置文件及 key 复制到其他节点并在其他节点启动 mysqlrouter,然后将 iTop 数据库配置改为
'db_host' => '127.0.0.1:6446',
使用 SinglePrimary 模式回顾以上测试 case
正常生成 工单编号,无重复。
# cat /tmp/a |sort |uniq -c |sort -n |tail -n 5
1 R-001910
1 R-001911
1 R-001912
1 R-001913
1 编号
依然存在。单主模式也用了 gtid
未发现异常
关闭 Primary 节点,可以看到 102 节点变成了 Primary
RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 94ef913e-c027-11ea-a1ee-5254004d77d3 | 192.168.10.102 | 3306 | ONLINE | PRIMARY | 8.0.20 |
| group_replication_applier | d56ba753-c027-11ea-a866-5254004d77d3 | 192.168.10.103 | 3306 | ONLINE | SECONDARY | 8.0.20 |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
访问 101 上的 iTop 实例,依然可以访问。
启动 101 上的 mysql,重新加入集群,角色为 SECONDARY
RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 51f47cd8-c027-11ea-958b-5254004d77d3 | 192.168.10.101 | 3306 | ONLINE | SECONDARY | 8.0.20 |
| group_replication_applier | 94ef913e-c027-11ea-a1ee-5254004d77d3 | 192.168.10.102 | 3306 | ONLINE | PRIMARY | 8.0.20 |
| group_replication_applier | d56ba753-c027-11ea-a866-5254004d77d3 | 192.168.10.103 | 3306 | ONLINE | SECONDARY | 8.0.20 |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
宕机Primary恢复后,指定其仍为 Primary
var c = dba.getCluster()
c.setPrimaryInstance('root@192.168.10.101')
结果
RUN on 192.168.10.102
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 51f47cd8-c027-11ea-958b-5254004d77d3 | 192.168.10.101 | 3306 | ONLINE | PRIMARY | 8.0.20 |
| group_replication_applier | 94ef913e-c027-11ea-a1ee-5254004d77d3 | 192.168.10.102 | 3306 | ONLINE | SECONDARY | 8.0.20 |
| group_replication_applier | d56ba753-c027-11ea-a866-5254004d77d3 | 192.168.10.103 | 3306 | ONLINE | SECONDARY | 8.0.20 |
+---------------------------+--------------------------------------+----------------+-------------+--------------+-------------+----------------+
https://dev.mysql.com/doc/refman/5.7/en/group-replication-limitations.html
文档中对 GET_LOCK
的限制并没有明确说只在 multi primary 模式存在。
实测,参考:https://blog.csdn.net/tangtong1/article/details/51792617
连接 PRIMARY 节点,创建 test
库, test_lock
表,注意mgr要求每个表都有索引,需要设置 primary key
create database test;
create table test_lock(
id int,
name varchar(50),
address varchar(50),
primary key(id)
);
启动两个客户端连接 MySQL。 A1 上执行以下语句:
select get_lock('key_lock', 100);
update test_lock set name = 'tt2', address = 'aaaaaaaaaaaaaaaaaaaa' where id = 1; #只更新name列
select release_lock('key_lock');
B1 上执行以下语句
select get_lock('key_lock', 100);
现象:
当 A1 执行完 get_lock
之后,B1 执行 get_lock
会卡住,直到 A1 执行 release_lock
之后 B1 才能成功获取锁。以下结果可以看到 A1 立即获取了锁,B1 等待了 23.9 秒才获取到锁。
A1:
MySQL 192.168.10.101:33060+ ssl test SQL > select release_lock('key_lock');
+--------------------------+
| release_lock('key_lock') |
+--------------------------+
| 1 |
+--------------------------+
1 row in set (0.0004 sec)
B1:
MySQL 192.168.10.101:33060+ ssl test SQL > select get_lock('key_lock', 100);
+---------------------------+
| get_lock('key_lock', 100) |
+---------------------------+
| 1 |
+---------------------------+
1 row in set (23.9473 sec)
结论,单主模式可以使用 GET_LOCK
接下来测试多主模式。
A1,B1 连接同一个 PRIMIARY 时,结果和单主模式相同,连接不同 PRIMARY 时,GET_LOCK 无效。
唯一性检查是一个比较重要的功能,应测试是否受 MGR 影响。
测试思路:
默认模型中 Person 的唯一性检查规则规定员工号唯一,尝试创建具有相同员工号的 Person,试试会不会有问题。和单节点做对照。
使用以下脚本测试(unique.sh)
#!/bin/bash
[ $# -lt 2 ] && echo "$0 password url" && exit 1
user=admin
password=$1
url=$2
json_data='{"operation":"core/create","comment":"test mgr","class":"Person","output_fields":"id,employee_number,friendlyname","fields":{"org_id":"SELECT Organization WHERE name = \"Demo\"","name":"Xing","first_name":"Ming","employee_number":"2020"}}'
curl -s "$url/webservices/rest.php?version=1.3" -d "auth_user=$user&auth_pwd=$password&json_data=$json_data" |jq .
执行以下脚本,并发向3个节点创建Person
#!/bin/bash
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./unique.sh admin http://192.168.10.101 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./unique.sh admin http://192.168.10.102 &
for id in `seq 1 10`;do echo $id;done |parallel -j 3 ./unique.sh admin http://192.168.10.103 &
在 单主 和 多主 模式的MGR集群中均创建了5个有相同工号的联系人。
for id in `seq 1 30`;do echo $id;done |parallel -j 30 ./unique.sh admin http://192.168.10.101
单主模式下 理论上也会出问题。非 mgr 下测试,创建了6个
System 下的 DB工具
代码注释里已经写明了可能的问题。。
// No iTopMutex so there might be concurrent access !
// But the necessary lock would have a high performance cost :(
加锁性能代价太大。。
dbobject.class.php 中
/**
* @internal
*
* @throws \CoreException
* @throws \OQLException
*
* @since 2.6.0 N°659 uniqueness constraint
* @api
*/
protected function DoCheckUniqueness()
{
$sCurrentClass = get_class($this);
$aUniquenessRules = MetaModel::GetUniquenessRules($sCurrentClass);
foreach ($aUniquenessRules as $sUniquenessRuleId => $aUniquenessRuleProperties)
{
if ($aUniquenessRuleProperties['disabled'] === true)
{
continue;
}
// No iTopMutex so there might be concurrent access !
// But the necessary lock would have a high performance cost :(
$bHasDuplicates = $this->HasObjectsInDbForUniquenessRule($sUniquenessRuleId, $aUniquenessRuleProperties);
if ($bHasDuplicates)
{
$bIsBlockingRule = $aUniquenessRuleProperties['is_blocking'];
if (is_null($bIsBlockingRule))
{
$bIsBlockingRule = true;
}
$sErrorMessage = $this->GetUniquenessRuleMessage($sUniquenessRuleId);
if ($bIsBlockingRule)
{
$this->m_aCheckIssues[] = $sErrorMessage;
continue;
}
$this->m_aCheckWarnings[] = $sErrorMessage;
continue;
}
}
}
# 然后在 DoCheckToWrite 函数中调用唯一性检查
/**
* Check integrity rules (before inserting or updating the object)
*
* **This method is not meant to be called directly, use DBObject::CheckToWrite()!**
* Errors should be inserted in $m_aCheckIssues and $m_aCheckWarnings arrays
*
* @overwritable-hook You can extend this method in order to provide your own logic.
* @see CheckToWrite()
* @see $m_aCheckIssues
* @see $m_aCheckWarnings
*
* @throws \ArchivedObjectException
* @throws \CoreException
* @throws \OQLException
*
*/
public function DoCheckToWrite()
{
$this->DoComputeValues();
$this->DoCheckUniqueness();
$aChanges = $this->ListChanges();
foreach($aChanges as $sAttCode => $value)
{
$res = $this->CheckValue($sAttCode);
if ($res !== true)
{
// $res contains the error description
$this->m_aCheckIssues[] = "Unexpected value for attribute '$sAttCode': $res";
}
}
if (count($this->m_aCheckIssues) > 0)
{
// No need to check consistency between attributes if any of them has
// an unexpected value
return;
}
$res = $this->CheckConsistency();
if ($res !== true)
{
// $res contains the error description
$this->m_aCheckIssues[] = "Consistency rules not followed: $res";
}
// Synchronization: are we attempting to modify an attribute for which an external source is master?
//
if ($this->m_bIsInDB && $this->InSyncScope() && (count($aChanges) > 0))
{
foreach($aChanges as $sAttCode => $value)
{
$iFlags = $this->GetSynchroReplicaFlags($sAttCode, $aReasons);
if ($iFlags & OPT_ATT_SLAVE)
{
// Note: $aReasonInfo['name'] could be reported (the task owning the attribute)
$oAttDef = MetaModel::GetAttributeDef(get_class($this), $sAttCode);
$sAttLabel = $oAttDef->GetLabel();
if (!empty($aReasons))
{
// Todo: associate the attribute code with the error
$this->m_aCheckIssues[] = Dict::Format('UI:AttemptingToSetASlaveAttribute_Name', $sAttLabel);
}
}
}
}
}
实践中,两人同时创建相同对象的情况应该很少。基本可以忽略此问题。如果出现,可以用 DB工具 来发现并修复问题。
另外,由本节开头提到的代码注释可知,唯一性检查未使用 iTopMutex,即和 GET_LOCK 无关,和MGR也无关。
yum install -y https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
yum install -y mysql-community-server mysql-shell
cat > /tmp/init.sql <<EOF
ALTER USER 'root'@'localhost' IDENTIFIED BY '$MYSQL_ROOT' PASSWORD EXPIRE NEVER;
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '$MYSQL_ROOT';
create user root@'%' identified WITH mysql_native_password BY '$MYSQL_ROOT';
grant all privileges on *.* to root@'%' with grant option;
flush privileges;
EOF
rm -fr /var/lib/mysql
mysqld --initialize-insecure --user=mysql
systemctl start mysqld
# (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2),重启生成 sock 文件
grep "report_host" /etc/my.cnf || echo "report_host=$MYIP" >> /etc/my.cnf
systemctl restart mysqld
systemctl status mysqld
mysql -uroot -e "show databases;"
mysql -uroot < /tmp/init.sql
cat << EOF > /tmp/config_local_instance.js
dba.configureLocalInstance('root@$MYIP:3306', {'password': 'root', 'interactive': false})
EOF
# 配置本地实例
mysqlsh --js --file=/tmp/config_local_instance.js
systemctl restart mysqld
cat << EOF > /tmp/check_local_instance.js
shell.connect('root@192.168.10.201:3306', 'root')
dba.checkInstanceConfiguration()
EOF
# 检查本地实例
mysqlsh --js --file=/tmp/check_local_instance.js
cat << EOF > /tmp/init_cluster.js
shell.connect('root@192.168.10.201:3306', 'root')
dba.createCluster('$CLUSTER_NAME', {'localAddress': '192.168.10.201','multiPrimary': true, 'force': true})
var cluster=dba.getCluster('$CLUSTER_NAME')
cluster.addInstance('root@192.168.10.202:3306', {'localAddress': '192.168.10.202', 'password': 'root', 'recoveryMethod':'clone'})
cluster.addInstance('root@192.168.10.203:3306', {'localAddress': '192.168.10.203', 'password': 'root','recoveryMethod':'clone'})
EOF
# 等全部节点启动后执行,只执行一次,js脚本中只在 201 节点上执行
if [ "$ID"x == "10203"x ];then
mysqlsh --js --file=/tmp/init_cluster.js
JS > var cluster = dba.getCluster()
JS > var cluster = dba.getCluster("<Cluster_name>")
JS > var cluster = dba.createCluster('name')
JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true})
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > dba.getCluster()
JS > dba.getCluster().status()
JS > dba.getCluster().checkInstanceState('root@<IP_Address>:<Port_No>')
JS > dba.getCluster().rejoinInstance('root@<IP_Address>:<Port_No>')
JS > dba.getCluster().describe()
digraph mgrsingle{
node[shape=box];
a1 [label="iTop-1"];
b1 [label="iTop-2"];
c1 [label="iTop-3"];
subgraph cluster_mgr_s{
node[shape=cylinder];
ms1 [label="PRIMARY"];
ms2 [label="SECONDARY-1"];
ms3 [label="SECONDARY-2"];
}
r1 [label="mysqlrouter1"];
r2 [label="mysqlrouter2"];
r3 [label="mysqlrouter3"];
a1 -> r1;
b1 -> r2;
c1 -> r3;
r1 -> {ms1,ms2,ms3;}
r2 -> {ms1,ms2,ms3;}
r3 -> {ms1,ms2,ms3;}
label="MGR Single-Primary";
}
digraph itopmgr {
node[shape=box];
a [label="iTop-1"];
b [label="iTop-2"];
c [label="iTop-3"];
subgraph cluster_mgr{
node[shape=cylinder];
m1 [label="PRIMARY-1"];
m2 [label="PRIMARY-2"];
m3 [label="PRIMARY-3"];
}
a -> m1;
b -> m2;
c -> m3;
label="MGR Multi-Primary";
}
From itop wiki:
什么是 GET_LOCK
GET_LOCK(str,timeout)
尝试使用
timeout
秒数超时来获取具有字符串str给定名称的锁. 负timeout值表示无限超时. 锁是排他的. 当一个会话举行时,其他会话无法获得相同名称的锁.注意
timeout
参数是获取锁这个操作的超时时间,不是锁本身的超时时间。