sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
5.99k stars 976 forks source link

Bug with multiple servers in group with mixed compress/uncompressed settings #1493

Open Reiner030 opened 6 years ago

Reiner030 commented 6 years ago

Hi,

I re-setup a new PHP + DB slave server with Debian Stretch. Before all was running fine with weighted settings with a local ProxySQL: external master + external slave + local slave - all uncompressed connections.

Since the compress flag for another database as mentioned in another issue was working nicely I added the compress setting for saving traffic also to new proxysql configuration. But only for the external master + external slave. For local connections it should be not helpful and not performant to also compress the data.

First days all works nicely but from time to time when the backend server has to do many PHP tasks there were exeptions thrown like: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60238ms without any reason - all servers are up, with not much connections and especially while these errors occured the connections went to zero on the localhost slave.

and nothing into the logs which gave a good hint - only complaining about the master (with weight=1) if a server was mentioned:

2018-05-04 07:05:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-04 07:15:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-04 07:15:13 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:16 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:19 main.cpp:1086:main(): [ERROR] Watchdog: 40 threads missed a heartbeat
2018-05-04 07:15:21 mysql_connection.cpp:640:handler(): [ERROR] Connect timeout on master:3306 : exceeded by 2
9251us
2018-05-04 07:15:21 mysql_connection.cpp:604:handler(): [ERROR] Failed to mysql_real_connect() on master:3306
, FD (Conn:1578 , MyDS:1578) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-04 07:15:21 mysql_connection.cpp:604:handler(): [ERROR] Failed to mysql_real_connect() on master:3306
, FD (Conn:1574 , MyDS:1574) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-04 07:15:21 MySQL_HostGroups_Manager.cpp:301:connect_error(): [ERROR] Shunning server master:3306 with
 5 errors/sec. Shunning for 10 seconds
2018-05-04 07:20:10 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat

It would be great to get some more hints in Wiki how the process is running and how debugging can be done (=> https://github.com/sysown/proxysql/issues/969 ) because I was lost by searching for reasons/solutions till I found these both issues by the error message... Second one was the in changelog also mentioned

Automatically processing of uncompressed packets even if wrongly marked as compressed https://github.com/sysown/proxysql/issues/1410

which remembered me to check my previous compress settings on the connections (which was off) I had them manually added some month ago in proxysql configuration before the ProxySQL reload function was available.

=> After setting ALL connections to compressed or to uncompressed the errors are gone. => After setting mixed compress/uncompress values to connections errors are back again.

==> this must be an "enhanced" problem of above bug #1410 ? Best would be to fix it for upcoming 2.0.0 release... :blush:

renecannao commented 6 years ago

This issue is interesting. To clarify, an important question: is your client (PHP) using compression, or no? From what I read, I would assume that the answer is no, but please correct me if I am wrong.

Based on my assumption that client is not using compression, that means that ProxySQL's code for handling compression is not relevant at all (so forget about #1410). If you are using compression for backends, all that ProxySQL does is to set CLIENT_COMPRESS flag, nothing more. Therefore, if a problem exists, it is mostly likely either on MySQL server, or in the embedded MariaDB Client Library. IF the problem is indeed compression, my main suspect right now is MariaDB Client Library, because this is what ProxySQL uses to communicates with backends.

Your error log indeeds give us some leads for further investigation. Is it possible to have the whole error log, or few hundreds lines? What I find relevant in your error log is:

At this point, I would investigate what causes an apparent lock. It is possible that ProxySQL's worker threads get locked, and they report connection timeout just because they weren't able to process the connection itself. If that's the case, compression is potentially unrelated to this issue. From the few lines I have from your error log, it is possible that heartbeats are missing at regular intervals (07:05:16, 07:15:10 and 07:20:10): having the whole error log will help us finding some pattern. Are you using mysql-threads=40 ? Unless you have an extremely beefy machine where ProxySQL is allowed to use 40 CPU cores, you should lower this value. Especially if you are running ProxySQL locally to your application. How much CPU is normally using ProxySQL?

Action items I would suggest right now: a) drastically reduce mysql-threads based on its CPU utilization b) provide the full error log, or a large portion of it

Thanks

Reiner030 commented 6 years ago

Hi, thanks for quick answer :smile: I setup yesterday my connection all to compressed so it's working already fine but that could be a common problem if mixed connections where needed.

(EDITED because image links were not working)

renecannao commented 6 years ago

Hi.

You are welcome! :) I still think that having a look at error log is the best starting point to troubleshoot this issue. Is it possible to have it?

About compression on clients: that is correct, clients can use compression or no, connections to backends can use compression or no, and they do not have to match. For example, a client can use no-compression, and proxysql uses compression to backend, and viceversa. My point in the previous comment is that proxysql's code only handles compression with clients: compression with backends is handled by mariadb client library. On mariadb client library: proxysql uses an embedded modified version, thus what you have installed is not relevant.

About mysql-threads: ProxySQL worker threads use async calls, they do not block on network operations. That means that a single thread can handle tens of thousands of connections. That said, the way to tune mysql-threads is not based on number of connections, but on CPU usage. In your setup it seems that mysql-threads=1 is actually enough, but you can use the default mysql-threads=4. Surely, please stay away from mysql-threads=40. (we need to mmake the wiki more clear on this).

Finally, the links you posted does seem to work.

Reiner030 commented 6 years ago

Hi,

Reiner030 commented 6 years ago

First easy parts / found problems - it could be better to split parts into separated issues but I didn't know what should gone separated so feel free to reuse it if it make sense:




Unable to open PDO connection [wrapped: SQLSTATE[HY000] [2002] Connection refused]' in ...
PropelException' with message 'Unable to open PDO connection [wrapped: SQLSTATE[HY000] [1040] Too many connections]' in ...
Unable to execute SELECT statement [SELECT ...] [wrapped: PDOStatement::execute(): MySQL server has gone away] in ...
Lost connection to MySQL server during query

=> which server is gone (if the message came from ProxySQL)?


2018-05-09 12:01:34 main.cpp:1086:main(): [ERROR] Watchdog: 5 threads missed a heartbeat
2018-05-09 12:01:34 main.cpp:1090:main(): [ERROR] Watchdog: reached 10 missed heartbeats. Aborting!
proxysql: main.cpp:1091: int main(int, const char**): Assertion `0' failed.
Error: signal 6:
proxysql(_Z13crash_handleri+0x25)[0x4cf485]
/lib/x86_64-linux-gnu/libc.so.6(+0x350e0)[0x7fe9cec4c0e0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fe9cec4c067]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fe9cec4d448]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7fe9cec45266]
/lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7fe9cec45312]
proxysql(main+0x70d)[0x4cc2c4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fe9cec38b45]
proxysql[0x4c99b4]
2018-05-09 12:01:35 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-09 12:01:35 main.cpp:877:ProxySQL_daemonize_phase3(): [INFO] Angel process started ProxySQL process 3852
Standard ProxySQL Cluster rev. 0.1.0702_DEBUG -- ProxySQL_Cluster.cpp -- Thu Feb  1 02:43:15 2018
Standard ProxySQL Statistics rev. 1.4.1027_DEBUG -- ProxySQL_Statistics.cpp -- Thu Feb  1 02:43:15 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031_DEBUG -- ProxySQL_HTTP_Server.cpp -- Thu Feb  1 02:43:15 2018
Admin initialized in 0.213958 secs.
Error: signal 11:
proxysql(_Z13crash_handleri+0x1a)[0x55be8b035ffa]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f4b2416d060]
proxysql(_ZN12MySQL_Thread3runEv+0x1460)[0x55be8b0632c0]
proxysql(_Z24mysql_worker_thread_funcPv+0x6c)[0x55be8b03435c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7f4b25aa2494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4b24222acf]
2018-05-05 01:25:11 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-05 01:25:11 [INFO] Angel process started ProxySQL process 30164
Standard ProxySQL Cluster rev. 0.1.0702 -- ProxySQL_Cluster.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Wed Apr  4 13:21:29 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Admin rev. 0.2.0902 -- ProxySQL_Admin.cpp -- Tue Apr 10 12:18:22 2018
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Tue Apr  3 06:28:45 2018
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Tue Apr  3 06:28:45 2018
...
Standard MySQL Monitor (StdMyMon) rev. 1.2.0723 -- MySQL_Monitor.cpp -- Wed Apr  4 13:21:29 2018
Error: signal 11:
proxysql(_Z13crash_handleri+0x1a)[0x55be8b035ffa]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f4b2416d060]
proxysql(free_root+0x5b)[0x55be8b21afbb]
proxysql(mysql_free_result+0x36)[0x55be8b215676]
proxysql(_ZN16MySQL_Connection7handlerEs+0x3fc)[0x55be8b14870c]
proxysql(_ZN16MySQL_Connection11async_queryEsPcmPP13st_mysql_stmtP23stmt_execute_metadata_t+0xa6)[0x55be8b1494d6]
proxysql(_ZN13MySQL_Session7handlerEv+0x1ba0)[0x55be8b0723b0]
proxysql(_ZN12MySQL_Thread20process_all_sessionsEv+0x2df)[0x55be8b05b00f]
proxysql(_ZN12MySQL_Thread3runEv+0x1740)[0x55be8b0635a0]
proxysql(_Z24mysql_worker_thread_funcPv+0x6c)[0x55be8b03435c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494)[0x7f4b25aa2494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f4b24222acf]
2018-05-05 01:30:09 main.cpp:905:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2018-05-05 01:30:09 [INFO] Angel process started ProxySQL process 1481
Standard ProxySQL Cluster rev. 0.1.0702 -- ProxySQL_Cluster.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Wed Apr  4 13:21:29 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Tue Apr  3 06:28:45 2018
Standard ProxySQL Admin rev. 0.2.0902 -- ProxySQL_Admin.cpp -- Tue Apr 10 12:18:22 2018
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Tue Apr  3 06:28:45 2018
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Tue Apr  3 06:28:45 2018

Since also with debug version logging is not easy to expand due to missing wiki documentation I have since now 2 more crashes (and running instance only for staging which didn't get enough traffic to reproduce the problem).

From latest issues it could be similar to #1522 but I get no real crashes

Here the log with time around the last "tried" production failure after 8:00 UTC:

2018-05-17 06:50:17 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 06:50:20 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:15:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:15:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:21 main.cpp:1086:main(): [ERROR] Watchdog: 4 threads missed a heartbeat
2018-05-17 07:20:27 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:30 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:33 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:36 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:20:38 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 24629us
2018-05-17 07:50:19 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-17 07:50:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 07:50:37 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:00:22 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:15:10 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:13 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:16 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:19 main.cpp:1086:main(): [ERROR] Watchdog: 86 threads missed a heartbeat
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-slave-02:3306 , FD (Conn:670 , MyDS:670) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 115.
2018-05-17 08:15:21 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 29554us
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-slave-02:3306 , FD (Conn:667 , MyDS:667) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-17 08:15:21 mysql_connection.cpp:628:handler(): [ERROR] Failed to mysql_real_connect() on production-master:3306 , FD (Conn:669 , MyDS:669) , 2013: Lost connection to MySQL server at 'reading authorization packet', system error: 11.
2018-05-17 08:15:22 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-17 08:15:25 main.cpp:1086:main(): [ERROR] Watchdog: 15 threads missed a heartbeat
2018-05-17 08:15:34 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:13 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:16 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:19 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:22 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 08:25:23 mysql_connection.cpp:664:handler(): [ERROR] Connect timeout on production-slave-02:3306 : exceeded by 30230us
2018-05-17 08:45:08 main.cpp:1086:main(): [ERROR] Watchdog: 19 threads missed a heartbeat
2018-05-17 09:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 09:45:15 main.cpp:1086:main(): [ERROR] Watchdog: 84 threads missed a heartbeat
2018-05-17 09:45:18 main.cpp:1086:main(): [ERROR] Watchdog: 84 threads missed a heartbeat
2018-05-17 09:45:21 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 09:57:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 11:35:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-17 11:50:06 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat

As we can see the ProxySQL service "actively" disconnects from any mysql backend server - even the localhost one - which are then clientless and the client got only exceptions like:

[wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 501 after 60075ms]' 

My idea is that perhaps deactivated multiplexing could cause such problems (boosted by mixed compressed/uncompressed backend connections) ? If this could be a problem: Could it be possible to setup default multiplexing=on and only the connections with a different INCREMENT_OFFSET could/would use multiplexing=off ?


How is debugging made ? ... the big question which is not explained in wiki / I cannot reproduce out of code:

ProxySQL> UPDATE debug_levels SET verbosity=3 WHERE module in ('debug_mysql_server', 'debug_mysql_connection', 'debug_mysql_connpool');
Query OK, 3 rows affected (0.00 sec)

ProxySQL> SELECT * FROM debug_levels;
+-----------------------------+-----------+
| module                      | verbosity |
+-----------------------------+-----------+
| debug_generic               | 0         |
| debug_net                   | 0         |
| debug_pkt_array             | 0         |
| debug_poll                  | 0         |
| debug_mysql_com             | 0         |
| debug_mysql_server          | 3         |
| debug_mysql_connection      | 3         |
| debug_mysql_connpool        | 3         |
| debug_mysql_rw_split        | 0         |
| debug_mysql_auth            | 0         |
| debug_mysql_protocol        | 0         |
| debug_mysql_query_processor | 0         |
| debug_memory                | 0         |
| debug_admin                 | 0         |
| debug_sqlite                | 0         |
| debug_ipc                   | 0         |
| debug_query_cache           | 0         |
| debug_query_statistics      | 0         |
+-----------------------------+-----------+
18 rows in set (0.00 sec)
ProxySQL> LOAD DEBUG LEVELS TO RUNTIME;
ERROR 1045 (#2800): near "LOAD": syntax error
ProxySQL> SAVE DEBUG LEVELS TO DISK;
ERROR 1045 (#2800): near "SAVE": syntax error
ProxySQL>

is not working as expected and there are no log entries for these tries...

Reiner030 commented 6 years ago

New version, new chance ... but sadly no luck:

# wget https://github.com/sysown/proxysql/releases/download/v1.4.9/proxysql_1.4.9-dbg-debian9_amd64.deb
# dpkg -i proxysql_1.4.9-dbg-debian9_amd64.deb

threads set to 20 in /etc/proxysql.cnf

# /etc/init.d/proxysql stop && /etc/init.d/proxysql reload

threads checked: now only 20 instead 100

First hour without clients it was looking fine so far.

But after touching all date to get them process around 18:30 UTC/20:30 MEST it tooks only 7 minutes to bring the new ProxySQL 1.4.9 down :sob:

=> Could this be a problem because of deactivated multiplexing ? If so, could it be unproblematic to activate multiplexing for all client connections which have the same INCREMENT_OFFSET set? So we could/need to deactivate it only for 2 countries where offset is different?

PHP Errors are all like this exception:

LOG: [COUNTRY] [2018-05-30 20:37:21] executing: ExportProductsSolr [id: 80d426cc-6437-11e8-adc4-4ccc6afcffd9]
<h1>Exception</h1><div style="background: #dadada; padding: 5px; font-weight: bold">PropelException - Unable to execute SELECT statement [SELECT gearman_synchronization.ID_GEARMAN_SYNCHRONIZATION, gearman_synchronization.UNIQUE_KEY, gearman_synchronization.ID_GROUP, gearman_synchronization.STATUS, gearman_synchronization.JOB_NAME, gearman_synchronization.USER, gearman_synchronization.EXCEPTION_MESSAGE, gearman_synchronization.CREATED_AT, gearman_synchronization.UPDATED_AT FROM `gearman_synchronization` WHERE gearman_synchronization.UNIQUE_KEY=:p1 LIMIT 1] [wrapped: SQLSTATE[HY000]: General error: 9001 Max connect timeout reached while reaching hostgroup 401 after 60305ms]</div><br />
in /data/www/staging/releases/20180530-050346/vendor/propel/propel1/runtime/lib/query/ModelCriteria.php (line: 1328)<br />

Interesting bug I found while writing - ProxySQL is shunning (with flapping between online/shunning) only the master which is fallback in the read group:

ProxySQL> SELECT hostgroup_id,hostname, status, weight, compression, max_connections, max_replication_lag, use_ssl, max_latency_ms FROM runtime_mysql_servers WHERE hostgroup_id IN (400,401);
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
| hostgroup_id | hostname       | status | weight  | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
| 400          | staging-master | ONLINE | 1       | 1           | 10000           | 0                   | 0       | 0              |
| 401          | 172.16.30.1    | ONLINE | 9999000 | 1           | 10000           | 10                  | 0       | 0              |
| 401          | staging-master | ONLINE | 1       | 1           | 10000           | 0                   | 0       | 0              |
| 401          | staging-slave  | ONLINE | 9999    | 1           | 10000           | 10                  | 0       | 0              |
+--------------+----------------+--------+---------+-------------+-----------------+---------------------+---------+----------------+
4 rows in set (0.00 sec)

Ah, and compression is on again ... (=> next test without compression on all backends).

Here the full start log with first lines after the incident - since I haven't still found an info howto increase debugging I couldn't offer more yet:

2018-05-30 17:39:03 ProxySQL_Admin.cpp:7694:flush_error_log(): [INFO] ProxySQL version v1.4.9-0-gd9fd599_DEBUG
2018-05-30 17:39:03 ProxySQL_Admin.cpp:7697:flush_error_log(): [INFO] Detected OS: Linux gearman-worker-fs1dc12-01 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64
2018-05-30 17:39:03 main.cpp:829:ProxySQL_daemonize_phase2(): [INFO] Starting ProxySQL
2018-05-30 17:39:03 main.cpp:830:ProxySQL_daemonize_phase2(): [INFO] Sucessfully started
Main daemonize phase1 completed in 0.000378 secs.
2018-05-30 17:39:03 main.cpp:877:ProxySQL_daemonize_phase3(): [INFO] Angel process started ProxySQL process 16234
Standard ProxySQL Cluster rev. 0.1.0702_DEBUG -- ProxySQL_Cluster.cpp -- Wed May 30 12:20:00 2018
Standard ProxySQL Statistics rev. 1.4.1027_DEBUG -- ProxySQL_Statistics.cpp -- Wed May 30 12:20:00 2018
Standard ProxySQL HTTP Server Handler rev. 1.4.1031_DEBUG -- ProxySQL_HTTP_Server.cpp -- Wed May 30 12:20:00 2018
2018-05-30 17:39:03 ProxySQL_Admin.cpp:7651:Read_ProxySQL_Servers_from_configfile(): [INFO] Cluster: Adding ProxySQL Servers gearman-worker-fs1dc12-01:6032 from config file
Admin initialized in 0.040146 secs.
Standard ProxySQL Admin rev. 0.2.0902_DEBUG -- ProxySQL_Admin.cpp -- Wed May 30 12:20:00 2018
Standard MySQL Threads Handler rev. 0.2.0902_DEBUG -- MySQL_Thread.cpp -- Wed May 30 12:20:00 2018
Main phase3 : GloMyLogger initialized in 3e-06 secs.
Standard MySQL Authentication rev. 0.2.0902_DEBUG -- MySQL_Authentication.cpp -- Wed May 30 12:20:00 2018
Main init phase2 completed in 0.042859 secs.
Main phase3 : GloMyLogger initialized in 2e-06 secs.
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:918:generate_mysql_servers_table(): [INFO] Dumping current MySQL Servers structures for hostgroup ALL
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
| hostgroup_id | hostname | port | weight | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | mem_pointer |
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
+--------------+----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+-------------+
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:587:commit(): [INFO] Dumping mysql_servers_incoming
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+----------------------------------
---------------------+
| hostgroup_id | hostname               | port | weight  | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment
                     |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+----------------------------------
---------------------+
| 500          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: master
                     |
| 501          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: slave
                     |
| 501          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | production: local slave gearm
an-worker-fs1dc12-01 |
| 501          | production-slave | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | production: slave
                     |
| 400          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: master
                     |
| 401          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: slave
                     |
| 401          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: local slave gearman-worker-fs1dc12-01    |
| 401          | staging-slave    | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: slave                                    |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1026:generate_mysql_replication_hostgroups_table(): [INFO] New mysql_replication_hostgroups table
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1065:generate_mysql_group_replication_hostgroups_table(): [INFO] New mysql_group_replication_hostgroups table
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:918:generate_mysql_servers_table(): [INFO] Dumping current MySQL Servers structures for hostgroup ALL
HID: 500 , address: production-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: production: master
HID: 501 , address: production-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: production: slave
HID: 501 , address: 172.16.30.1 , port: 3306 , weight: 9999000 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: production: local slave gearman-worker-fs1dc12-01
HID: 501 , address: production-slave , port: 3306 , weight: 9999 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: production: slave
HID: 400 , address: staging-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: master
HID: 401 , address: staging-master , port: 3306 , weight: 1 , status: ONLINE , max_connections: 10000 , max_replication_lag: 0 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: slave
HID: 401 , address: 172.16.30.1 , port: 3306 , weight: 9999000 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: local slave gearman-worker-fs1dc12-01
HID: 401 , address: staging-slave , port: 3306 , weight: 9999 , status: ONLINE , max_connections: 10000 , max_replication_lag: 10 , use_ssl: 0 , max_latency_ms: 0 , comment: staging: slave
2018-05-30 17:39:03 MySQL_HostGroups_Manager.cpp:1014:generate_mysql_servers_table(): [INFO] Dumping mysql_servers
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
| hostgroup_id | hostname                             | port | weight  | status | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment                                               | mem_pointer     |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
| 500          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: master                                | 140221738338944 |
| 401          | staging-slave    | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: slave                                    | 140221738338048 |
| 401          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | staging: local slave gearman-worker-fs1dc12-01    | 140221738338176 |
| 401          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: slave                                    | 140221738338304 |
| 400          | staging-master          | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | staging: master                                   | 140221738338432 |
| 501          | production-slave | 3306 | 9999    | 0      | 1           | 10000           | 10                  | 0       | 0              | production: slave                                 | 140221738338560 |
| 501          | 172.16.30.1                          | 3306 | 9999000 | 0      | 1           | 10000           | 10                  | 0       | 0              | production: local slave gearman-worker-fs1dc12-01 | 140221738338688 |
| 501          | production-master       | 3306 | 1       | 0      | 1           | 10000           | 0                   | 0       | 0              | production: slave                                 | 140221738338816 |
+--------------+--------------------------------------+------+---------+--------+-------------+-----------------+---------------------+---------+----------------+-------------------------------------------------------+-----------------+
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:247:ProxySQL_Node_Entry(): [INFO] Created new Cluster Node Entry for host gearman-worker-fs1dc12-01:6032
Main phase3 : GloAdmin initialized in 0.003388 secs.
Standard Query Processor rev. 0.2.0902_DEBUG -- Query_Processor.cpp -- Wed May 30 12:20:00 2018
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:60:ProxySQL_Cluster_Monitor_thread(): [INFO] Cluster: starting thread for peer gearman-worker-fs1dc12-01:6032
Main phase3 : Query Processor initialized in 0.000514 secs.
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
Main phase3 : MySQL Threads Handler initialized in 0.001844 secs.
In memory Standard Query Cache (SQC) rev. 1.2.0905_DEBUG -- Query_Cache.cpp -- Wed May 30 12:20:00 2018
Main phase3 : Query Cache initialized in 0.000174 secs.
2018-05-30 17:39:03 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:180:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer gearman-worker-fs1dc12-01:6032 . Error: Can't connect to MySQL server on 'gearman-worker-fs1dc12-01' (107)
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:334:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_query_rules from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x92405DB72E48C230 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:336:set_checksums(): [INFO] Cluster: checksum for mysql_query_rules from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x92405DB72E48C230 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:354:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_servers from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x1128419592D44BE4 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:356:set_checksums(): [INFO] Cluster: checksum for mysql_servers from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x1128419592D44BE4 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:374:set_checksums(): [INFO] Cluster: detected a new checksum for mysql_users from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x1F26AE2C9089A939 . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:376:set_checksums(): [INFO] Cluster: checksum for mysql_users from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x1F26AE2C9089A939 , we won't sync.
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:410:set_checksums(): [INFO] Cluster: detected a new checksum for proxysql_servers from peer gearman-worker-fs1dc12-01:6032, version 1, epoch 1527701943, checksum 0x85B2851AFCC5981D . Not syncing yet ...
2018-05-30 17:39:04 ProxySQL_Cluster.cpp:412:set_checksums(): [INFO] Cluster: checksum for proxysql_servers from peer gearman-worker-fs1dc12-01:6032 matches with local checksum 0x85B2851AFCC5981D , we won't sync.
Main phase3 : MySQL Threads Handler listeners started in 4.00432 secs.
Standard MySQL Monitor (StdMyMon) rev. 1.2.0723_DEBUG -- MySQL_Monitor.cpp -- Wed May 30 12:20:00 2018
Main phase3 : MySQL Monitor initialized in 7.8e-05 secs.
Main init phase3 completed in 4.01297 secs.
2018-05-30 18:07:08 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:07:11 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:30:12 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-30 18:30:15 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-30 18:31:09 main.cpp:1086:main(): [ERROR] Watchdog: 3 threads missed a heartbeat
2018-05-30 18:31:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:31:15 main.cpp:1086:main(): [ERROR] Watchdog: 2 threads missed a heartbeat
2018-05-30 18:32:09 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:32:12 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:34:15 main.cpp:1086:main(): [ERROR] Watchdog: 1 threads missed a heartbeat
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5410 , MyDS:5410) , 1040: Too many connections.
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5411 , MyDS:5411) , 1040: Too many connections.
2018-05-30 18:37:17 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5410 , MyDS:5410) , 1040: Too many connections.
2018-05-30 18:37:17 MySQL_HostGroups_Manager.cpp:302:connect_error(): [ERROR] Shunning server staging-master:3306 with 5 errors/sec. Shunning for 10 seconds
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 mysql_connection.cpp:629:handler(): [ERROR] Failed to mysql_real_connect() on staging-master:3306 , FD (Conn:5603 , MyDS:5603) , 1040: Too many connections.
2018-05-30 18:37:25 MySQL_HostGroups_Manager.cpp:302:connect_error(): [ERROR] Shunning server staging-master:3306 with 5 errors/sec. Shunning for 10 seconds
Reiner030 commented 6 years ago

Sadly there is still no cause found. I tried many possible changes with every time testing the behavior 3x with 30-45 minutes runtime each ... I guess you can imagine how much fun I had last weeks because there is no documentation howto debug ProxySQL correctly.

Because many tests were failing I took the packaged config (with threads=4) and added only the urgently needed config parts like server, users, rules, and maxsize/timout values from our config to the default one.

In my tests (with our deactivated multiplexing because of the INDEX_OFFSET=... initialization) it was working last Thursday/Friday like this:

From this tests it looks like that the compression commonly breaks our pools.

But today the production run failed again even with using the tested successfull settings and so staging which results that there is no "real" replicable "experimental" set-up possible. :sob:

We can modify the setup /routine for this special case but must have in mind that connections running over ProxySQL could fail without recognizable cause in many other high throughput situations so that ProxySQL seems needed to be replaced somehow which I tried to avoid; also it's nice included in solutions like from Severalnines which could be a later enhancment.

Here the failed staging run from today with ProxySQL stats page completely: https://drive.google.com/open?id=1c1CpSDVR2mE-CksEwt9UdGsz5V3h7nfl Here with ignoring the server/client connections and question to show other values: https://drive.google.com/open?id=1bIdkkRNQGNp3GX63YUQYEwxg_HWt_Gv1 The other values are always 0.

I tried it also with free_connections_pct=0 instead default 10 but it was also failing.

If interested I could also mail/share all other test images but still it seems to be non-reproducable within several days it makes no much sense I guess.

joelhock commented 4 years ago

I have run across an issue with mixed compressed/uncompressed clients, also, with an easy way to reproduce. if an uncompressed client's backend response gets cached, then the cached copy can be used correctly with either a future uncompressed or compressed client. however, if a compressed client's request to the backend gets cached, then when a future uncompressed client sends the same query, it seems compressed cached data gets sent to it, and it drops the connection quickly.

renecannao commented 4 years ago

Thank you @joelhock , perhaps you provided the missing information to get this worked on!

joelhock commented 4 years ago

one more piece of information: in addition to my prior comment about compressed cached results not being usable for uncompressed clients, it also seems if the client that triggers a new connection to a database backend is a compressed client, then that connection to the backend stays forever compressed and future requests from uncompressed clients that are fulfilled by that backend connection will cause the uncompressed client to disconnect.

renecannao commented 4 years ago

Reopening , this isn't fixed in 2.1.0

joelhock commented 4 years ago

my test cases now work with 2.0.13. thank you!!