Open kashak88 opened 2 years ago
Are there any News about this Issue? I have the same Problem, we have 3 Instances running in AWS:
mysql> select SERVER_ID,SESSION_ID,REPLICA_LAG_IN_MILLISECONDS from INFORMATION_SCHEMA.REPLICA_HOST_STATUS;
+--------------------------------------------------------------+--------------------------------------+-----------------------------+
| SERVER_ID | SESSION_ID | REPLICA_LAG_IN_MILLISECONDS |
+--------------------------------------------------------------+--------------------------------------+-----------------------------+
| xxxxx-serverless-db-preprod-1 | MASTER_SESSION_ID | 0 |
| application-autoscaling-75f16825-1229-43e6-8f38-7a713f20628b | b591aae9-9b75-4c38-adf8-91f48b8f1e9b | 17 |
| xxxxx-serverless-db-preprod-0 | ca2b450e-595e-4af4-9059-00fb25407425 | 17 |
+--------------------------------------------------------------+--------------------------------------+-----------------------------+
ProxySQL shows 4 Backend Servers active:
MySQL [(none)]> SELECT * FROM runtime_mysql_servers;
+--------------+----------------------------------------------------------------------------------------------------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------------------------------------------------------------------------------------------------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 0 | xxx-serverless-db-preprod-1.xxx.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 1 | application-autoscaling-4f896800-f48a-4b06-829a-51a074675853.xxx.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 1 | application-autoscaling-75f16825-1229-43e6-8f38-7a713f20628b.xxx.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 1 | application-autoscaling-73c42c9f-d91b-4e85-8a6c-dabe5b1797ad.xxx.eu-central-1.rds.amazonaws.com | 3306 | 0 | SHUNNED | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 1 | xxx-serverless-db-preprod-0.xxx.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
+--------------+----------------------------------------------------------------------------------------------------------+------+-----------+---------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
The application-autoscaling-4f896800xxxxxxxx Host was already removed from the Cluster. Im running ProxySQL Version 2.3.2-10-g8cd66cf
Cheers!
Please attach the full error log, and the output of SELECT * FROM monitor.mysql_server_aws_aurora_log;
Thanks
Hi!
I was only able to get these Information from our DEV Stage, we have the same Issue there. The removed Host in this Stage is: application-autoscaling-524e6cc0-3de9-4b8e-b659-182d1152248a.xxxx.eu-central-1.rds.amazonaws.com:3306
The Information Schema on the AWS Cluster:
mysql> select SERVER_ID,SESSION_ID,REPLICA_LAG_IN_MILLISECONDS from INFORMATION_SCHEMA.REPLICA_HOST_STATUS;
+--------------------------------------------------------------+--------------------------------------+-----------------------------+
| SERVER_ID | SESSION_ID | REPLICA_LAG_IN_MILLISECONDS |
+--------------------------------------------------------------+--------------------------------------+-----------------------------+
| xxxx-serverless-db-dev-0 | MASTER_SESSION_ID | 0 |
| application-autoscaling-a7e5c2e8-5a9c-4d6c-836d-0d640e50b172 | 22d1dbaa-3cb8-40d9-a8b1-aca28b30aedf | 21 |
| xxxx-serverless-db-dev-1 | d88f11bd-431a-408e-9550-0c8a132e37a7 | 18 |
+--------------------------------------------------------------+--------------------------------------+-----------------------------+
The runtime Servers on proxysql still including the removed Host ....524e6cc0.....:
MySQL [(none)]> SELECT * FROM runtime_mysql_servers;
+--------------+----------------------------------------------------------------------------------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+----------------------------------------------------------------------------------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 0 | xxxx-serverless-db-dev-0.c9gniix5fy3o.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 1 | application-autoscaling-524e6cc0-3de9-4b8e-b659-182d1152248a.xxxx.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 1 | application-autoscaling-a7e5c2e8-5a9c-4d6c-836d-0d640e50b172.xxxx.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
| 1 | xxxx-serverless-db-dev-1.c9gniix5fy3o.eu-central-1.rds.amazonaws.com | 3306 | 0 | ONLINE | 1 | 0 | 1000 | 0 | 1 | 0 | |
+--------------+----------------------------------------------------------------------------------------------------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
4 rows in set (0.00 sec)
Please find the ProxySQL and aurora Log in the attached zip File. proxysql.zip
Cheers, Thomas
Hi @renecannao; We are facing the same issue. It seems that this request is duplicated by the following: https://github.com/sysown/proxysql/issues/2524
A possible workaround is to execute the below code block as a cron job/scheduler but I'd prefer to avoid it:
SAVE MYSQL SERVERS FROM RUNTIME;
DELETE FROM mysql_servers WHERE hostname IN (SELECT hostname FROM runtime_mysql_servers WHERE status='SHUNNED');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
This bug is reproducible in the latest ProxySQL version, Do you need any additional log files to review it? Please let me know to share it with you!
Thanks in advance.
Same problem here, I've scheduled the cron to clean these up but its certianly not a clean solution imo.
Same problem here.
Any news on this?
Do you plan to take a look at this till next release please? I consider this as serious issue, because when failover occurs I experience that even server is shunned (already removed from aurora cluster) I get error in mysql client that ERROR 2005 (HY000) at line 1: Unknown MySQL server host ‘xxxxxxx.ssssss.us-east-2.rds.amazonaws.com’ (-2) 20221129-14:31:30
Same issue here with proxysql 2.4.7
2023-02-07 00:38:39 MySQL_Monitor.cpp:3580:monitor_dns_resolver_thread(): [ERROR] An error occurred while resolving hostname: application-autoscaling-xxx.eu-west-1.rds.amazonaws.com [-2] 2023-02-07 00:38:39 MySQL_Monitor.cpp:3580:monitor_dns_resolver_thread(): [ERROR] An error occurred while resolving hostname: application-autoscaling-xxx39c.xxx.eu-west-1.rds.amazonaws.com [-2] 2023-02-07 00:38:39 MySQL_Monitor.cpp:3580:monitor_dns_resolver_thread(): [ERROR] An error occurred while resolving hostname: application-autoscaling-xxxxcf.xxx.eu-west-1.rds.amazonaws.com [-2]
The issue seems to be that there's no explicit support for detecting a cluster instance that goes away, so the normal health checking behavior is all that's at play here, and it assumes the server's just temporarily unavailable, setting the status to SHUNNED.
I think the need here is a new feature in the aurora monitor that detects when a host is completely gone from the cluster and either removes it or sets it to OFFLINE_HARD.
@tabacco agree, and that often occurs when autoscaling is enabled on Aurora
proxysql 2.3.2 5.13.0-1023-aws #25~20.04.1-Ubuntu SMP Mon Apr 25 19:28:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux aurora mysql 2.10.1
Hello, We are doing some loadtesting and it appears that the autoscaled node is not being removed or set to OFFLINE for some reason after being deleted (scaled down), causing log spam that doesn't go away until proxysql is restarted.
The instance is cleared from the list after proxysql restart. Please let me know if you need anything else or if it's misconfiguration on my part.
Regards