sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
6k stars 976 forks source link

Aurora primary SHUNNED for nonexistent lag #3107

Open jtomaszon opened 4 years ago

jtomaszon commented 4 years ago

Hey Team! I found an issue during a simple configuration with Aurora Cluster. The scenario:

When I connect all together, I keep receiving this error on proxySQL error logs:

2020-10-15 16:12:37 [INFO] MySQL_HostGroups_Manager::commit() locked for 3ms
2020-10-15 16:12:37 [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 100 , address: gcp-clickfunnels-staging.c4gxn5pmgmjd.us-east-1.rds.amazonaws.com , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 50 , use_ssl: 0 , max_latency_ms: 15000000 , comment: 
HID: 200 , address: gcp-mothership-stg-db-01.c4gxn5pmgmjd.us-east-1.rds.amazonaws.com , port: 3306 , gtid_port: 0 , weight: 1 , status: ONLINE , max_connections: 1000 , max_replication_lag: 50 , use_ssl: 0 , max_latency_ms: 15000000 , comment: 
2020-10-15 16:12:37 [INFO] Dumping mysql_servers: ALL
+-----+-------------------------------------------------------------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| hid | hostname                                                          | port | gtid | weight | status | cmp | max_conns | max_lag | ssl | max_lat | comment | mem_pointer     |
+-----+-------------------------------------------------------------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
| 100 | xxx.us-east-1.rds.amazonaws.com | 3306 | 0    | 1      | 0      | 0   | 1000      | 50      | 0   | 15      |         | 139861648882272 |
| 200 | xxx.us-east-1.rds.amazonaws.com | 3306 | 0    | 1      | 0      | 0   | 1000      | 50      | 0   | 15      |         | 139861577245152 |
+-----+-------------------------------------------------------------------+------+------+--------+--------+-----+-----------+---------+-----+---------+---------+-----------------+
2020-10-15 16:12:37 [INFO] Received SAVE MYSQL SERVERS TO DISK command
2020-10-15 17:02:43 MySQL_HostGroups_Manager.cpp:2891:replication_lag_action(): [WARNING] Re-enabling server xxx.us-east-1.rds.amazonaws.com:3306 from HG 100 with replication lag of -2 second
prod ProxySQL> select * from mysql_aws_aurora_hostgroups\G
*************************** 1. row ***************************
     writer_hostgroup: 100
     reader_hostgroup: 200
               active: 1
          aurora_port: 3306
          domain_name: .xxx.us-east-1.rds.amazonaws.com
           max_lag_ms: 20
    check_interval_ms: 100
     check_timeout_ms: 80
writer_is_also_reader: 0
    new_reader_weight: 1
           add_lag_ms: 25
           min_lag_ms: 5
       lag_num_checks: 1
              comment: Aurora Cluster

No errors on mysql_server_aws_aurora_log table. Tried with and without add_lag_ms but still an issue

ProxySQL ver 2.0.14 Ubuntu package

jtomaszon commented 4 years ago

As a workaround, deleting mysql_aws_aurora_hostgroups and just using normal mysql_replication_hostgroups keep the things running.

bangpound commented 4 years ago

We are seeing this on new clusters launched with the 5.6.mysql_aurora.1.23.0 engine version.

whera commented 3 years ago

Como solução alternativa, excluir mysql_aws_aurora_hostgroupse usar apenas o normal mysql_replication_hostgroupsmantém as coisas funcionando.

But how can I control the lag of the aurora cluster replicas

jtomaszon commented 3 years ago

Como solução alternativa, excluir mysql_aws_aurora_hostgroupse usar apenas o normal mysql_replication_hostgroupsmantém as coisas funcionando.

But how can I control the lag of the aurora cluster replicas

Sadly you won't be able to.. The key here is, Aurora replicas won't have lag more than a tens of ms, from my experience (heavy write batches application) we never see that behavior. Max Lag will be around 30-50ms. If your application supports that kind of lag, you should be fine to keep using old replication group

Just a disclosure, it's me and my experience saying that, not a recommendation from AWS or even ProxySQL team

whera commented 3 years ago

tks!! @jtomaszon