openark / orchestrator

MySQL replication topology management and HA
Apache License 2.0
5.61k stars 927 forks source link

how could i clear the old ERROR in orchestrator? #1195

Closed larry7xia80 closed 4 years ago

larry7xia80 commented 4 years ago

hi @shlomi-noach , I had Worker 1 failed executing issue error on orchestrator GUI after reconfigured replication with ssl cert . The replication chain is

 (Top Master) GCP --native replication-----> mysql01( master) ---semi-sync--> mysql02(slave)
                                             |--------------------semi-sync--> mysql03(slave).

the error occured on mysql03(slave). orchestrator version is 3.1.4, mysql version is Percona server 5.7.20.

i fixed by restart the mysql03 slave thread. the mysql03 (slave) is up and running as below.

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.#0.0.##0
                  Master_User: repl
                  Master_Port: #####
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000008
          Read_Master_Log_Pos: 333198406
               Relay_Log_File: relay-bin.000009
                Relay_Log_Pos: 5864149
        Relay_Master_Log_File: mysql-bin.000008
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 333198406
              Relay_Log_Space: 5864350
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File: /####/ca.pem
           Master_SSL_CA_Path: 
              Master_SSL_Cert: /####/mysql-slave.pem
            Master_SSL_Cipher: 
               Master_SSL_Key: /####/mysql-slave.key
        Seconds_Behind_Master: 0
mysql> SELECT * from performance_schema.replication_applier_status_by_worker;
+--------------+-----------+-----------+---------------+----------------------------------------------+-------------------+--------------------+----------------------+
| CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION                        | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP |
+--------------+-----------+-----------+---------------+----------------------------------------------+-------------------+--------------------+----------------------+
|              |         1 |        58 | ON            | 3a121e8e-93b9-11ea-adec-42010a500075:2454048 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         2 |        59 | ON            | 3a121e8e-93b9-11ea-adec-42010a500075:2453811 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         3 |        60 | ON            | 3a121e8e-93b9-11ea-adec-42010a500075:2442188 |                 0 |                    | 0000-00-00 00:00:00  |
|              |         4 |        61 | ON            |                                              |                 0 |                    | 0000-00-00 00:00:00  |
+--------------+-----------+-----------+---------------+----------------------------------------------+-------------------+--------------------+----------------------+

but orchestrator GUI still shows the error. how could i clear the ERROR in orchestrator? i tried to click the skip query, but no luck ,shows bad connection/command etc.

thanks Best Regards, Larry

shlomi-noach commented 4 years ago

I’m sorry. I really don’t understand the question. What is worker 1? What is the error? Can you please paste output of “orchestrator-client -c topology”?

larry7xia80 commented 4 years ago

sorry about that @shlomi-noach ,

1. the output of the topoloy is 
  [root@ ~]# /usr/local/orchestrator/resources/bin/orchestrator-client --auth dbadmin    -c topology -alias pcimicro
Enter host password for user 'dbadmin':
prd-#######-v12-####mysql01:3306   [0s,ok,5.7.20-19-log,rw,ROW,>>,GTID]
+ prd-#######-v12-####mysql02:3306 [0s,ok,5.7.20-19-log,ro,ROW,>>,GTID]
- prd-#######-v12-####mysql03:3306 [unknown,invalid,5.7.20-19-log,ro,ROW,>>,GTID]
  1. the error is Last SQL error on [orchestrator GUI ] mysql03,
    Last SQL error  
    "Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '3a121e8e-93b9-11ea-adec-42010a500075:606536' at master log mysql-bin.000006,end_log_pos 543323203. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any."

    , but i have fixed it on mysql , and the slave(mysql03) is up and running...

Will orchestrator rediscovery mysql03 with correct state if i forget mysql03?
[assuming forget will not break the replication]

thanks

shlomi-noach commented 4 years ago

The problem is that orchestrator is unable to access the server. Are firewall rules changed? Are accounts changed? Are privileges changed?

Re: forget, you could, but that's not the issue.

larry7xia80 commented 4 years ago

thanks @shlomi-noach , confirmed with DevOps, new firewall rules added, which block orchestrator's access to mysql03 . but i only test access with 3 mysql nodes. fixed.

thanks again.

larry7xia80 commented 4 years ago

close