openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.64k stars 933 forks source link

GTID auto position isn't set when using move-below or move-up (but is when using relocate) #1461

Open m00dawg opened 2 years ago

m00dawg commented 2 years ago

It seems certain commands seem to be disabling GTID auto position and I'm having trouble determining why. This is with Orch v3.2.6 and MySQL 5.7.

If I run either of:

 sudo orchestrator -c move-below -i host2:3306 -d host3:3306
sudo orchestrator -c move-up -i host2:3306

The auto_position setting from SHOW SLAVE STATUS flips to 0, even if it was previously set to 1. This is true if I manually run CHANGE MASTER TO MASTER_AUTO_POSITION=1 or use the enable-gtid command prior to running.

However, if I use relocate, auto-position is preserved, e.g.:

sudo orchestrator -c relocate -i host2:3306 -d host3:3306

I've tried this with AutoPseudoGTID enabled and disabled (we prefer to use full GTIDs whenever possible and actually do not want Pseudo GTIDs as it adds a ton of data to our PMM dashboards).

If move-* commands are intended to be used without GTID, it might be nice to have them fail if GTID is enabled? Or at least the docs updated to indicate these break GTID/auto-position? In looking at the code, this seems like it is unintentional?

Here's the debug info when running move-below and relocate. I've changed hostnames, IPs, etc. but otherwise the sequence is the same:

move-below:

~$ sudo orchestrator -c move-below -i host10:3306 -d host2:3306
2022-10-11 15:15:35 DEBUG Hostname unresolved yet: host10
2022-10-11 15:15:35 DEBUG Cache hostname resolve host10 as host10
2022-10-11 15:15:35 DEBUG Hostname unresolved yet: host2
2022-10-11 15:15:35 DEBUG Cache hostname resolve host2 as host2
2022-10-11 15:15:35 DEBUG Connected to orchestrator backend: sqlite on /var/lib/orchestrator/orchestrator.db
2022-10-11 15:15:35 DEBUG Initializing orchestrator
2022-10-11 15:15:35 INFO Connecting to backend :3306: maxConnections: 128, maxIdleConns: 32
2022-10-11 15:15:35 DEBUG Hostname unresolved yet: host1
2022-10-11 15:15:35 DEBUG Cache hostname resolve host1 as host1
2022-10-11 15:15:35 DEBUG Hostname unresolved yet: host1
2022-10-11 15:15:35 DEBUG Cache hostname resolve host1 as host1
2022-10-11 15:15:35 INFO Will move host10:3306 below host2:3306
2022-10-11 15:15:35 INFO auditType:begin-maintenance instance:host10:3306 cluster:host1:3306 message:maintenanceToken: 1, owner: root, reason: move below host2:3306
2022-10-11 15:15:35 INFO auditType:begin-maintenance instance:host2:3306 cluster:host1:3306 message:maintenanceToken: 2, owner: root, reason: host10:3306 moves below this
2022-10-11 15:15:35 INFO Stopped replication on host10:3306, Self:mysql-10-bin.000002:54824454, Exec:mysql-01-bin.000031:363768758
2022-10-11 15:15:35 INFO Stopped replication on host2:3306, Self:mysql-02-bin.000002:54824454, Exec:mysql-01-bin.000031:363768758
2022-10-11 15:15:35 DEBUG ChangeMasterTo: will attempt changing master on host10:3306 to host2:3306, mysql-02-bin.000002:54824454
2022-10-11 15:15:35 INFO ChangeMasterTo: Changed master on host10:3306 to: host2:3306, mysql-02-bin.000002:54824454. GTID: false
2022-10-11 15:15:35 INFO Started replication on host10:3306
2022-10-11 15:15:35 INFO Started replication on host2:3306
2022-10-11 15:15:35 INFO auditType:move-below instance:host10:3306 cluster:host1:3306 message:moved host10:3306 below host2:3306
2022-10-11 15:15:35 INFO auditType:end-maintenance instance:host2:3306 cluster:host1:3306 message:maintenanceToken: 2
2022-10-11 15:15:35 INFO auditType:end-maintenance instance:host10:3306 cluster:host1:3306 message:maintenanceToken: 1
host10:3306<host2:3306

relocate:

~$ sudo orchestrator -c relocate -i host10:3306 -d host2:3306
2022-10-11 15:16:00 DEBUG Hostname unresolved yet: host10
2022-10-11 15:16:00 DEBUG Cache hostname resolve host10 as host10
2022-10-11 15:16:00 DEBUG Hostname unresolved yet: host2
2022-10-11 15:16:00 DEBUG Cache hostname resolve host2 as host2
2022-10-11 15:16:00 DEBUG Connected to orchestrator backend: sqlite on /var/lib/orchestrator/orchestrator.db
2022-10-11 15:16:00 DEBUG Initializing orchestrator
2022-10-11 15:16:00 INFO Connecting to backend :3306: maxConnections: 128, maxIdleConns: 32
2022-10-11 15:16:00 INFO Will move host10:3306 below host2:3306 via GTID
2022-10-11 15:16:00 INFO auditType:begin-maintenance instance:host10:3306 cluster:host2:3306 message:maintenanceToken: 4, owner: root, reason: move below host2:3306
2022-10-11 15:16:00 DEBUG Hostname unresolved yet: host1
2022-10-11 15:16:00 DEBUG Cache hostname resolve host1 as host1
2022-10-11 15:16:00 DEBUG Hostname unresolved yet: host1
2022-10-11 15:16:00 DEBUG Cache hostname resolve host1 as host1
2022-10-11 15:16:00 INFO Stopped replication on host10:3306, Self:mysql-10-bin.000002:54993956, Exec:mysql-02-bin.000002:54993956
2022-10-11 15:16:00 DEBUG ChangeMasterTo: will attempt changing master on host10:3306 to host2:3306, mysql-01-bin.000031:363922016
2022-10-11 15:16:00 INFO ChangeMasterTo: Changed master on host10:3306 to: host2:3306, mysql-01-bin.000031:363922016. GTID: true
2022-10-11 15:16:00 INFO Started replication on host10:3306
2022-10-11 15:16:00 INFO auditType:move-below-gtid instance:host10:3306 cluster:host2:3306 message:moved host10:3306 below host2:3306
2022-10-11 15:16:00 INFO auditType:end-maintenance instance:host10:3306 cluster:host2:3306 message:maintenanceToken: 4
2022-10-11 15:16:00 INFO auditType:relocate-below instance:host10:3306 cluster:host2:3306 message:relocated host10:3306 below host2:3306
host10:3306<host2:3306

Config:

{
  "AutoPseudoGTID": true,
  "UseSuperReadOnly" : false,
  "Debug": false,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/var/lib/orchestrator/orchestrator.db",
  "MySQLTopologyUser": "svc_orchestrator",
  "MySQLTopologyPassword": "346ASDF3456jdfowier2tas",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "RaftEnabled": false,
  "RaftBind": "3.232.63.146",
  "RaftDataDir": "/var/lib/raft",
  "DefaultRaftPort": 10008,
  "RaftNodes": [],
  "DiscoverByShowSlaveHosts": true,
  "DiscoveryIgnoreHostnameFilters": [],
  "InstancePollSeconds": 5,
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@report_host",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".hosts.secretcdn.net:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "multi",
  "HTTPAuthUser": "svc_http_orchestrator",
  "HTTPAuthPassword": "SomeRandomPass",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "SlaveLagQuery": "",
  "DetectClusterAliasQuery": "SELECT cluster_name FROM meta.cluster WHERE anchor=1",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "SELECT @@hostname",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]secretcdn[.]net",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]secretcdn[.]net",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": true,
  "UseMutualTLS": false,
  "SSLSkipVerify": true,
  "MySQLOrchestratorSSLSkipVerify": true,
  "SSLPrivateKeyFile": "/etc/vaultly/orchestrator/server-key.pem",
  "SSLCertFile": "/etc/vaultly/orchestrator/server-cert.pem",
  "SSLCAFile": "/etc/vaultly/orchestrator/ca-cert.pem",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "PseudoGTIDPattern": "",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPollSeconds": 10,
  "RecoveryPeriodBlockSeconds": 3600,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    "_master_pattern_"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "_intermediate_master_pattern_"
  ],
  "OnFailureDetectionProcesses": [
    "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo 'Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [],
  "PostMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "MasterFailoverDetachSlaveMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeSlaveRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true
}