sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
6.02k stars 978 forks source link

ProxySQL 2.4.5 crash in MySQL_Session.cpp:3303 #4145

Open calh opened 1 year ago

calh commented 1 year ago

In memory Standard Query Cache (SQC) rev. 1.2.0905 -- Query_Cache.cpp -- Wed Dec 14 12:20:55 2022 Standard MySQL Monitor (StdMyMon) rev. 2.0.1226 -- MySQL_Monitor.cpp -- Wed Dec 14 12:20:55 2022 2023-03-03 21:08:29 [INFO] For information about products and services visit: https://proxysql.com/ 2023-03-03 21:08:29 [INFO] For online documentation visit: https://proxysql.com/documentation/ 2023-03-03 21:08:29 [INFO] For support visit: https://proxysql.com/services/support/ 2023-03-03 21:08:29 [INFO] For consultancy visit: https://proxysql.com/services/consulting/

2023-03-03 21:08:29 MySQL_Session.cpp:4312:handler_minus1_LogErrorDuringQuery(): [WARNING] Error during query on (1,REDACTED,3306,1004501): 1193, Unknown system variable 'transaction_isolation' 2023-03-03 21:08:29 MySQL_Variables.cpp:320:validate_charset(): [WARNING] Server doesn't support collation (255) utf8mb4_0900_ai_ci. Replacing it with the configured default (33) utf8_general_ci. Client REDACTED



- [x] A clear description of the issue

I was previously running mysql 5.5 with proxysql 2.3.2, and migrated our production workload to a new cluster running mysql 5.6 and proxysql 2.4.5.  The new cluster had been running fine for about 8 days before this crash happened.  Although I don't have full query logging turned on, there's nothing particularly special happening at 8:00pm on a Friday compared to any other time of day.

A crash in itself isn't too horrible as long as ProxySQL can restart itself and continue serving traffic.  However in this case, something wild started happening with the configuration database.

Above in the log file, you can see where it loads the mysql servers.  One has a weight of 100, and the others are 1.  My configuration has them all at 100 weight.  Also later on in the file, there's some odd things going on with locale settings or maybe column types that I have never seen before.

When I started investigating this issue, I noticed that my max_connections was set to 2048, but my configuration is supposed to be 8192.  I discovered that, on restart, proxysql re-read my `/etc/proxysql.cnf` file and loaded some configuration settings that I had used to bootstrap the cluster.  This is _not_ supposed to happen, and I guessed that maybe something was corrupt with the `proxysql.db` file.  

I tried this next:

1. Shut down proxysql
2. Deleted the `proxysql.db` and `proxysql.db.bak` files
3. Started proxysql back up
4. Re-loaded my config variables from an SQL file
5. Restarted proxysql

After I restarted proxysql the second time, it again re-read `/etc/proxysql.cnf` and destroyed my config variables that I loaded in step 4.  It seems the only choice I had was to port my SQL config over to the `proxysql.cnf` format just in case it crashes again.

- [x] The steps to reproduce the issue

Currently, all I have to do is change a config setting, save it to disk, restart proxysql and it's gone.  It seems to re-read `proxysql.cnf` every time and wipe out the proxysql database.  Until this crash happened it was working fine and restarts behaved normally, leaving the `proxysql.cnf` file alone.

If this is a crashing bug, please also include:
- [x] The package used to install ProxySQL:  proxysql-2.4.5-1.x86_64  from https://repo.proxysql.com/ProxySQL/proxysql-2.4.x/centos/
- [x] The compressed core dump -- I do have a core dump if needed, but it has a lot of sensitive information in it

Thank you!
calh commented 1 year ago

I believe I have some more findings that make this issue a little more obvious.

Back in December when I bootstrapped this cluster, something must have failed on the first startup of proxysql:

# systemctl status proxysql-initial
● proxysql-initial.service - High Performance Advanced Proxy for MySQL, this service will reset the database and start ProxySQL
   Loaded: loaded (/etc/systemd/system/proxysql-initial.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2022-12-28 10:13:33 CST; 2 months 7 days ago
 Main PID: 15906 (code=exited, status=1/FAILURE)

systemd[1]: Starting High Performance Advanced Proxy for MySQL, this service will reset the database and start ProxySQL...
systemctl[15906]: Job for proxysql.service failed because a timeout was exceeded.
systemctl[15906]: See "systemctl status proxysql.service" and "journalctl -xe" for details.
systemd[1]: proxysql-initial.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: proxysql-initial.service: Failed with result 'exit-code'.
systemd[1]: Failed to start High Performance Advanced Proxy for MySQL, this service will reset the database and start ProxySQL.

Which shouldn't have been a big deal. I fixed things and started it up after that. However, since this one shot service failed, systemd has had this environment variable stuck in it the whole time:

# systemctl show-environment
LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
PROXYSQL_OPTS=--initial

It apparently has been destroying and recreating the database every time I restart proxysql. When I work on this server, I'm also frequently adjusting the SQL config and loading it anyway. It only surfaced and became a problem when a crash happened.

This one shot service to destroy the database and recreate it is maybe a little dangerous to use on a deployed package. Is there some better way you can accomplish this? Or maybe a safer implementation of the --initial flag that won't destroy an existing proxysql disk database?