Closed welyss closed 1 year ago
galaxyengine version: latest 现象: dn三个节点cand1,cand2,log0的engine的container无法正常启动,其中cand1的alert.log有如下错误: [2022-08-05 14:24:11.830049] [ERROR] EasyNet::onConnected server 1 [2022-08-05 14:24:11.830573] [ERROR] EasyNet::onConnected server 2 [2022-08-05 14:24:13.903579] [ERROR] Server 3 : new term(old:30,new:32) !! [2022-08-05 14:24:13.903579] [ERROR] Server 3 : Paxos state change from FOLL to FOLL !! [2022-08-05 14:24:19.904576] [ERROR] Server 3 : leaderStickiness check: msg::force(1) state_:0 electionTimer_::Stage:0 leaderId_:2 . [2022-08-05 14:24:19.904576] [ERROR] Server 3 : New Term in onRequestVote !! server 1 's term(33) is bigger than me(32). [2022-08-05 14:24:19.904576] [ERROR] Server 3 : new term(old:32,new:33) !! [2022-08-05 14:24:19.904576] [ERROR] Server 3 : Paxos state change from FOLL to FOLL !! [2022-08-05 14:24:19.904576] [ERROR] Server 3 : isVote: 1, local(lli:2563665, llt:32); msg(candidateid: 1, term: 33 lli:2563665, llt:32) . 2022-08-05T14:24:23.724310+08:00 5 [Warning] [MY-000000] [Server] Apply thread start, recover status = 0, start apply index = 0, rli consensus index = 733096. 2022-08-05T14:24:23.738472+08:00 0 [System] [MY-010931] [Server] /opt/galaxy_engine/bin/mysqld: ready for connections. Version: '8.0.18' socket: '/data/mysql/run/mysql.sock' port: 17822 Source distribution. 2022-08-05T14:24:23.857115+08:00 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 45822 socket: '/tmp/mysqlx.sock' bind-address: '::' port: 33060 2022-08-05T14:24:25.010332+08:00 5 [Warning] [MY-000000] [Server] Apply thread group relay log file name = '/data/mysql/log/mysql_bin.000003', pos = 323488886, rli apply index = 733096. 2022-08-05T14:24:25.089284+08:00 5 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Error 'XAER_NOTA: Unknown XID' on query. Default database: '__cdc___single'. Query: 'XA COMMIT X'647264732d313462343830623166613430313030304062623236613963383163636433386265',X'5f5f4344435f5f5f53494e474c455f47524f5550',1', Error_code: MY-001397 2022-08-05T14:24:25.089370+08:00 5 [Warning] [MY-010584] [Repl] Slave: XAER_NOTA: Unknown XID Error_code: MY-001397 2022-08-05T14:24:25.092896+08:00 5 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 323488886
[2022-08-05 14:24:11.830049] [ERROR] EasyNet::onConnected server 1
[2022-08-05 14:24:11.830573] [ERROR] EasyNet::onConnected server 2
[2022-08-05 14:24:13.903579] [ERROR] Server 3 : new term(old:30,new:32) !!
[2022-08-05 14:24:13.903579] [ERROR] Server 3 : Paxos state change from FOLL to FOLL !!
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : leaderStickiness check: msg::force(1) state_:0 electionTimer_::Stage:0 leaderId_:2 .
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : New Term in onRequestVote !! server 1 's term(33) is bigger than me(32).
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : new term(old:32,new:33) !!
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : Paxos state change from FOLL to FOLL !!
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : isVote: 1, local(lli:2563665, llt:32); msg(candidateid: 1, term: 33 lli:2563665, llt:32) .
2022-08-05T14:24:23.724310+08:00 5 [Warning] [MY-000000] [Server] Apply thread start, recover status = 0, start apply index = 0, rli consensus index = 733096.
2022-08-05T14:24:23.738472+08:00 0 [System] [MY-010931] [Server] /opt/galaxy_engine/bin/mysqld: ready for connections. Version: '8.0.18' socket: '/data/mysql/run/mysql.sock' port: 17822 Source distribution.
2022-08-05T14:24:23.857115+08:00 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 45822 socket: '/tmp/mysqlx.sock' bind-address: '::' port: 33060
2022-08-05T14:24:25.010332+08:00 5 [Warning] [MY-000000] [Server] Apply thread group relay log file name = '/data/mysql/log/mysql_bin.000003', pos = 323488886, rli apply index = 733096.
2022-08-05T14:24:25.089284+08:00 5 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Error 'XAER_NOTA: Unknown XID' on query. Default database: '__cdc___single'. Query: 'XA COMMIT X'647264732d313462343830623166613430313030304062623236613963383163636433386265',X'5f5f4344435f5f5f53494e474c455f47524f5550',1', Error_code: MY-001397
2022-08-05T14:24:25.089370+08:00 5 [Warning] [MY-010584] [Repl] Slave: XAER_NOTA: Unknown XID Error_code: MY-001397
2022-08-05T14:24:25.092896+08:00 5 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 323488886
把三个节点的数据目录都删除并删除对应的pod,触发重建,dn节点便能正常启动,但是此时因为数据被清空,会导致跟gms关联的信息不一致, 猜测是异常停节点时导致数据损坏了, 这个情况下,应该如何恢复集群? 如果登陆到cand1的engine,show slave status\G,会有Error 'XAER_NOTA: Unknown XID' on query. Default database: '__cdc___single'. Query: 'XA COMMIT X'647264732d313462343830623166613430313030304062623236613963383163636433386265',X'5f5f4344435f5f5f53494e474c455f47524f5550',1'的错误,因为跟传统mysql不一样,有没有相关文档供参考
galaxyengine version: latest 现象: dn三个节点cand1,cand2,log0的engine的container无法正常启动,其中cand1的alert.log有如下错误:
[2022-08-05 14:24:11.830049] [ERROR] EasyNet::onConnected server 1
[2022-08-05 14:24:11.830573] [ERROR] EasyNet::onConnected server 2
[2022-08-05 14:24:13.903579] [ERROR] Server 3 : new term(old:30,new:32) !!
[2022-08-05 14:24:13.903579] [ERROR] Server 3 : Paxos state change from FOLL to FOLL !!
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : leaderStickiness check: msg::force(1) state_:0 electionTimer_::Stage:0 leaderId_:2 .
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : New Term in onRequestVote !! server 1 's term(33) is bigger than me(32).
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : new term(old:32,new:33) !!
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : Paxos state change from FOLL to FOLL !!
[2022-08-05 14:24:19.904576] [ERROR] Server 3 : isVote: 1, local(lli:2563665, llt:32); msg(candidateid: 1, term: 33 lli:2563665, llt:32) .
2022-08-05T14:24:23.724310+08:00 5 [Warning] [MY-000000] [Server] Apply thread start, recover status = 0, start apply index = 0, rli consensus index = 733096.
2022-08-05T14:24:23.738472+08:00 0 [System] [MY-010931] [Server] /opt/galaxy_engine/bin/mysqld: ready for connections. Version: '8.0.18' socket: '/data/mysql/run/mysql.sock' port: 17822 Source distribution.
2022-08-05T14:24:23.857115+08:00 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 45822 socket: '/tmp/mysqlx.sock' bind-address: '::' port: 33060
2022-08-05T14:24:25.010332+08:00 5 [Warning] [MY-000000] [Server] Apply thread group relay log file name = '/data/mysql/log/mysql_bin.000003', pos = 323488886, rli apply index = 733096.
2022-08-05T14:24:25.089284+08:00 5 [ERROR] [MY-010584] [Repl] Slave SQL for channel '': Error 'XAER_NOTA: Unknown XID' on query. Default database: '__cdc___single'. Query: 'XA COMMIT X'647264732d313462343830623166613430313030304062623236613963383163636433386265',X'5f5f4344435f5f5f53494e474c455f47524f5550',1', Error_code: MY-001397
2022-08-05T14:24:25.089370+08:00 5 [Warning] [MY-010584] [Repl] Slave: XAER_NOTA: Unknown XID Error_code: MY-001397
2022-08-05T14:24:25.092896+08:00 5 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 323488886
把三个节点的数据目录都删除并删除对应的pod,触发重建,dn节点便能正常启动,但是此时因为数据被清空,会导致跟gms关联的信息不一致, 猜测是异常停节点时导致数据损坏了, 这个情况下,应该如何恢复集群? 如果登陆到cand1的engine,show slave status\G,会有Error 'XAER_NOTA: Unknown XID' on query. Default database: '__cdc___single'. Query: 'XA COMMIT X'647264732d313462343830623166613430313030304062623236613963383163636433386265',X'5f5f4344435f5f5f53494e474c455f47524f5550',1'的错误,因为跟传统mysql不一样,有没有相关文档供参考