Open gercorri opened 3 years ago
Logs attached. proxysql.log.2.masked.txt.zip
Hi @gercorri .
I am sorry to read about your issue.
I reviewed the log, and I have some comments.
First, you are running 2.0.12.
Users have reported several issues with Aurora, and now fixed.
For example, https://github.com/sysown/proxysql/issues/3082 is fixed in 2.0.16 .
The next release of proxysql (not released yet) will have few more fixes for what seems Aurora bugs: https://github.com/sysown/proxysql/pull/3515
An interesting bug is, for example, that during a failover it is possible to see two servers with MASTER_SESSION_ID
.
I am quite confident that the new proxysql release will solve this edge cases. Once released, please test it, and then we can close this issue.
Thanks
Hi @gercorri,
ProxySQL v2.3.0 has just been released and it holds fixes that can potentially solve this issue. Please let us know when you test it if that is the case for you so the issue can be closed.
Thanks, Javier.
This is the place to report and seek assitance for what what looks like a reproducible bug.
Recently in production the application stopping being able to write to the database runs in AWS Aurora (MySQL) and stis behind ProxySQL. The system had been running without any issue for approx 1 year. Around 4 months ago we upgraded to version 2.0.12-38-g58a909a0, codename Truls with no issues.
We have been unable to find the root cause and hence have had to temporarily bypass ProxySQL which is causing load issues on our database.
On investigation it was discovered that the writer instance in AWS was not responding for around 30 seconds and this somehow caused proxysql to reconfigure it hostgroups incorrectly.
The ProxySQL cluster is:
When the issue occured the proxusql hostgroup config changed as follows:
Therefore not only has the config been changed and a reader is the writer hostgroup but also the config is not synced between the 3 nodes.
In the AWS database logs at the time of incident we can see Access denied for user 'monitor' errors
We tried to replicate this is our staging environment and although we couldn't get it into the exact same state we did manage to replicate something very similar.
We tested multiple failovers in AWS by failing the reader over to be the writer and vice versa and this all worked fine multiple times. However when we shutdown and restated the writer node proxysql reconfigured its hostgroups incorrectly, lost sync and wasn't able to recover.
The hostgroup confiugration got changed to:
In summary the hostgroups are misconfigured and out of synce between the 3 nodes and proxysql never recovers when the writer instance is available again. It seems that it can handle failovers without issue but can;t handle the case when the writer is unavailable e.g. due to be shutdown for maintainence or a network issue.
The setup is as follows:
Please advise if this issue has been seen before and if there are confiugration changes we may need to make of if any other details are required.
Thanks, Gerard.