Closed ayder closed 6 years ago
Hi Sinan,
Thank you for the report, seems like a bug: it is spinning somewhere. Can you please:
/var/lib/proxysql/proxysql.log
sudo pt-pmp --pid 3554 --iterations 30
sudo perf record -p 3554
Once perf record
is completed, please run the follow and attach report.txt
sudo perf report > report.txt
Thanks
I haven't notice any errors in proxysql.log, only some users activity logged.
Report file is attached.
Thanks.
Could you please attach also the output of pt-pmp
?
From the output of perf
it seems it is spinning on poll()
.
Is the two connections in use stop sending traffic, does the CPU usage go to zero?
Sorry, I have already restarted the application.
Now pt-pmp is returning
Unhandled dwarf expression opcode 0xf3
rapidly
I'll check to drop connections when this occurs agains
I'm seeing similar with the latest 1.3.1 release. Attached is the result from a run under low load. I'll rerun this later under our full query load where we see two things happening:
CPU usage seems to creep up over time, starting at around 200-300% reported by top, to 700% or more hours later under the same query load
Once CPU usage is up we also notice a large increase in memory use by the proxysql process, which stays constantly low for many hours after a fresh start, until something happens and it balloons
Some stats: in full query load we have 20-40k qps, generating ~700MBit/s+ of db traffic. We recently switched to compressing mysql traffic to keep things in check until we optimize our db usage to reduce the amount of data we push into mysql and read out of it is a good test bed for proxysql performance.
Attached is a report you can see in the mean time but we'll rerun it later when more load is introduced
I met same problem with 1.3.1 release. The proxysql running about 2 hours. I found CPU usage very high.
proxysql.log include some errors message:
2016-12-29 10:30:01 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot start a transaction within a transaction --- BEGIN 2016-12-29 10:30:08 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot commit - no transaction is active --- COMMIT 2016-12-29 10:30:08 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot start a transaction within a transaction --- BEGIN 2016-12-29 10:30:17 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot commit - no transaction is active --- COMMIT 2016-12-29 10:39:01 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot start a transaction within a transaction --- BEGIN 2016-12-29 10:39:02 sqlite3db.cpp:56:execute(): [ERROR] SQLITE error: cannot commit - no transaction is active --- COMMIT
Attached perf command report file.
This CPU usage seems a spike caused by some spinloop. @taylor840326 : are you graphing also ProxySQL's metrics? Would be interesting to find a correlation between that spike and other metrics.
Thanks
I use telegraf+grafana monitoring CPU load
@renecannao version 1.2.6 is ok.
@taylor840326 : which version is not ok?
When is not ok, do you have any profiling from perf
or pt-pmp
?
@renecannao I have tested v1.2.6/v1.3.1/v1.3.2 version, all the version have same problem. I'm not sure when the problem happended. Maybe 2 hours or 6 hours or 12 hours.
attached the perf report file. perf_1.2.6.txt
@taylor840326 : this perf report doesn't seem related to what reported in #870 .
Can you please run pt-pmp
against proxysql?
Also please share your configuration (from mysql_servers
, global_variables
and mysql_query_rules
)?
@renecannao ok, I will use pt-pmp comamnd next time.
proxysql-1.3.3 rpm running on RHEL 6.8 has the same issues
My Problem occurs mainly when there is a lag between the proxsql and mysql serves. ( Our case Proxysql is in branch office, and mysql is at production site)
# Samples: 3 of event 'cpu-clock'
# Event count (approx.): 750000
#
# Overhead Command Shared Object Symbol
# ........ ........ ................. ...........
#
100.00% proxysql [kernel.kallsyms] [k] do_wait
@ayder , can you please run pt-pmp
against proxysql?
When running ps
you will notice two proxysql processes, one is the parent and the other is the child.
Assuming 12345 is the pid of the child process, you need to run something like:
pt-pmp --iterations 30 -p 12345
I really want to close this issue, but I need to understand where it is using a lot of CPU. Thank you for your help.
The output of pt-pmp command against the child process of pt-pmp is Unhandled dwarf expression opcode 0xf3
240 pthread_cond_wait,remove,ConsumerThread::run,start_thread(libpthread.so.0),clone(libc.so.6)
120 epoll_wait(libc.so.6),MySQL_Thread::run,mysql_worker_thread_func_idles,start_thread(libpthread.so.0),clone(libc.so.6)
118 poll(libc.so.6),MySQL_Thread::run,mysql_worker_thread_func,start_thread(libpthread.so.0),clone(libc.so.6)
30 poll(libc.so.6),admin_main_loop,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),Query_Cache::purgeHash_thread,mysql_shared_query_cache_funct,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::run,libstdc++::??(libstdc++.so.6),start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_replication_lag,monitor_replication_lag_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_read_only,monitor_read_only_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_ping,monitor_ping_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),MySQL_Monitor::monitor_connect,monitor_connect_pthread,start_thread(libpthread.so.0),clone(libc.so.6)
30 nanosleep(libc.so.6),usleep(libc.so.6),main
2 ::??,::??,::??
@ayder , thank you very much!
From the output it seems that all the threads are waiting.
The threads in MySQL_Monitor
are sleeping;
The Admin thread is waiting on poll()
;
The 4 main MySQL_Thread
are waiting on poll()
;
The 4 auxiliary MySQL_Thread
are waiting on epoll()
;
The 8 ConsumerThread()
are waiting in the queue.
I am not sure what it is causing the CPU spike: from the above everything seems quiet, very idle.
Maybe it is possible that your system isn't able to manage so many threads. Can you please reconfigure proxysql to use just one thread? Thanks
set mysql-threads=1;
save mysql variables to disk;
proxysql restart;
After setting the mysql-threads =1 Variable Cpu usage is now %100 of 1 core of 2 core vm. (That was %200 before utilizing all the cores)
Without restarting proxysql, can you please try again to run pt-pmp , maybe with an increased number of iteration:
pt-pmp --iterations 60 -p 12345
Thank you.
Oh the output this time is:
/usr/bin/proxysql (deleted): No such file or directory. Cannot access memory at address 0x687fa290 60 ::??
I have not deleted the binary nor changed it. [root@XXDBAPPSL01 ~]# ls -l /usr/bin/proxysql -rwxr-xr-x 1 root root 13087538 Jan 21 00:10 /usr/bin/proxysql
@ayder : this probably mean that proxysql was upgraded. You can verify this comparing the out of these two commands:
/usr/bin/proxysql --version
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT @@version" -NB
Well versions consistent, no upgrade has done. I'll restart the aplicattion and wait again
[salyuruk@XXDBAPPSL01 ~]$ mysql --defaults-file=.proxysql.cfg -p -e "SELECT @@version" -NB Enter password: 1.3.3-0-gf2b677a [salyuruk@OSLDBAPPSL01 ~]$ proxysql --version ProxySQL version 1.3.3-0-gf2b677a, codename Truls
This is getting really odd. Another hypothesis is that there are two proxies running. Can you please run this:
ps aux | grep proxysql
[root@XXDBAPPSL01 ~]# ps aux | grep proxysql root 19223 0.0 0.2 51260 5124 ? S Feb10 0:00 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql root 19224 16.9 1.6 172348 31804 ? Sl Feb10 1217:10 proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql root 60320 0.0 0.0 103324 848 pts/1 S+ 15:18 0:00 grep proxysql
Nope
Honestly, I am out of ideas for now.
The previous output of pt-pmp
doesn't show anything odd, and now /usr/bin/proxysql (deleted): No such file or directory
is very odd too.
Is there anything unusual reported in dmesg
or syslog or proxysql log file?
What about lsof -n -p 19224
?
[root@XXDBAPPSL01 ~]# lsof -n -p 19224 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME proxysql 19224 root cwd DIR 253,0 4096 139114 /var/lib/proxysql proxysql 19224 root rtd DIR 253,0 4096 2 / proxysql 19224 root txt REG 253,0 12982906 390056 /usr/bin/proxysql (deleted) proxysql 19224 root mem REG 253,0 1969952 395943 /usr/lib64/libcrypto.so.1.0.1e proxysql 19224 root mem REG 253,0 448344 392447 /usr/lib64/libssl.so.1.0.1e proxysql 19224 root mem REG 253,0 161704 396570 /lib64/ld-2.12.so proxysql 19224 root mem REG 253,0 1930416 398992 /lib64/libc-2.12.so proxysql 19224 root mem REG 253,0 146592 399382 /lib64/libpthread-2.12.so proxysql 19224 root mem REG 253,0 23088 399347 /lib64/libdl-2.12.so proxysql 19224 root mem REG 253,0 47760 399391 /lib64/librt-2.12.so proxysql 19224 root mem REG 253,0 600048 399989 /lib64/libm-2.12.so proxysql 19224 root mem REG 253,0 91096 392580 /lib64/libz.so.1.2.3 proxysql 19224 root mem REG 253,0 124640 392013 /lib64/libselinux.so.1 proxysql 19224 root mem REG 253,0 114496 399177 /lib64/libresolv-2.12.so proxysql 19224 root mem REG 253,0 93320 391716 /lib64/libgcc_s-4.4.7-20120601.so.1 proxysql 19224 root mem REG 253,0 989840 400216 /usr/lib64/libstdc++.so.6.0.13 proxysql 19224 root mem REG 253,0 17256 392093 /lib64/libcom_err.so.2.1 proxysql 19224 root mem REG 253,0 12592 406828 /lib64/libkeyutils.so.1.3 proxysql 19224 root mem REG 253,0 177520 392097 /lib64/libk5crypto.so.3.1 proxysql 19224 root mem REG 253,0 46368 392717 /lib64/libkrb5support.so.0.1 proxysql 19224 root mem REG 253,0 280520 392095 /lib64/libgssapi_krb5.so.2.2 proxysql 19224 root mem REG 253,0 948840 392087 /lib64/libkrb5.so.3.3 proxysql 19224 root mem REG 253,0 27896 391297 /lib64/libnss_dns-2.12.so proxysql 19224 root mem REG 253,0 66432 391299 /lib64/libnss_files-2.12.so proxysql 19224 root 0r CHR 1,3 0t0 4217 /dev/null proxysql 19224 root 1w REG 253,0 584020 129989 /var/lib/proxysql/proxysql.log proxysql 19224 root 2w REG 253,0 584020 129989 /var/lib/proxysql/proxysql.log proxysql 19224 root 3r FIFO 0,8 0t0 20510499 pipe proxysql 19224 root 4w FIFO 0,8 0t0 20510499 pipe proxysql 19224 root 5u REG 253,0 33792 139105 /var/lib/proxysql/proxysql.db proxysql 19224 root 6u REG 253,0 33792 139105 /var/lib/proxysql/proxysql.db proxysql 19224 root 7r FIFO 0,8 0t0 20510502 pipe proxysql 19224 root 8w FIFO 0,8 0t0 20510502 pipe proxysql 19224 root 9r FIFO 0,8 0t0 20510503 pipe proxysql 19224 root 10w FIFO 0,8 0t0 20510503 pipe proxysql 19224 root 11u IPv4 20510505 0t0 TCP *:6033 (LISTEN) proxysql 19224 root 12u REG 0,9 0 4213 [eventpoll] proxysql 19224 root 13u unix 0xffff88007d1c4540 0t0 20510506 /tmp/proxysql.sock proxysql 19224 root 14u IPv4 20510518 0t0 TCP 127.0.0.1:6032 (LISTEN) proxysql 19224 root 15u unix 0xffff88007d006800 0t0 20510519 /tmp/proxysql_admin.sock proxysql 19224 root 16u IPv4 21257635 0t0 TCP 10.35.100.197:36666->10.35.78.78:mysql (ESTABLISHED) proxysql 19224 root 17u IPv4 21274282 0t0 TCP 10.35.100.197:59106->10.35.71.23:mysql (ESTABLISHED) proxysql 19224 root 18u IPv4 21235705 0t0 TCP 10.35.100.197:6033->10.238.236.15:49559 (ESTABLISHED) proxysql 19224 root 19u sock 0,6 0t0 21235708 can't identify protocol proxysql 19224 root 20u IPv4 21269066 0t0 TCP 10.35.100.197:6033->10.238.236.15:56218 (ESTABLISHED) proxysql 19224 root 21u IPv4 21269069 0t0 TCP 10.35.100.197:37218->10.35.78.78:mysql (ESTABLISHED)
Hi, Today we put proxysql on our production server, and we are facing the same issue. We haven't loaded any query rules yet
> select @@version
1.3.4-0-gd158707
select * from global_variables
global.txt
@alejg: can you please run pt-pmp
as described above?
Also, would be possible to generate a core dump of the running process using gcore ?
Thanks
Yes, proxysql-pt-pmp.txt I'll send the core dump to an email (at sysown.com) ok?
Thanks!
Thank you @alejg . My email is rene.cannao@gmail.com . One more question: which package are you using? Which OS?
But before sending the core...
Looking the output of pt-pmp
I have the feeling that proxysql is not spinning in a loop without reason, but actually serving a lot of traffic:
From the screenshot I can also deduce that mysqld
running on the same box is quite busy too.
Therefore it is possible that the high CPU utilization you are seeing in ProxySQL is not because of a bug, but because it is really doing a lot of work.
Can you please also run these commands and let me know the output?
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT * FROM stats_mysql_connection_pool; SHOW MYSQL STATUS;"
sleep 30
mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "SELECT * FROM stats_mysql_connection_pool; SHOW MYSQL STATUS;"
Thanks
Thank you @renecannao!! CentOS Linux release 7.2.1511 proxysql-1.3.4-1-centos7.x86_64.rpm
You are right, is a very busy server which spends most of his time doing IO. Worked fine for several hours, now the CPU raised up the load average above 18.
Thanks!
Continuing this thread via email. I will update the issue with further finding.
I am also having this problem intermittently. My Server is idle with
Tasks: 126 total, 1 running, 125 sleeping, 0 stopped, 0 zombie
Cpu(s): 38.9%us, 58.8%sy, 0.0%ni, 1.0%id, 0.0%wa, 0.0%hi, 0.1%si, 1.2%st
Mem: 8059196k total, 7313936k used, 745260k free, 140376k buffers
Swap: 0k total, 0k used, 0k free, 974752k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30931 proxysql 20 0 318m 86m 2624 S 363.7 1.1 120:20.48 proxysql
19023 root 20 0 325m 57m 4832 S 14.3 0.7 89:14.53 mysqld_exporter
28096 mysql 20 0 14.1g 1.2g 33m S 9.6 15.2 5:55.45 mysqld
1372 root 20 0 343m 24m 3288 S 8.0 0.3 797:42.80 node_exporter
# sudo strace -c -p 30931
Process 30931 attached
^CProcess 30931 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.72 4.380782 273799 16 nanosleep
1.28 0.056584 56584 1 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00 4.437366 17 total
restarting proxysql solves the issue temporarily
@doddys , if you run strace without -f flag, strace will only check the main thread and nothing more: this is exactly what you got, a strace of the main thread that is doing nothing more but nanosleep() .
How frequently does it happen? Would it be possible to generate a core dump of the running process using gcore ? From the output above it seems it uses really little memory, so a dump should take really a small amount of time (possibly a fraction of second). Thanks
let me try to get a core dump and get back to you
On Mar 14, 2017, at 8:14 PM, René Cannaò notifications@github.com wrote:
@doddys https://github.com/doddys , if you run strace without -f flag, strace will only check the main thread and nothing more: this is exactly what you got, a strace of the main thread that is doing nothing more but nanosleep() .
How frequently does it happen? Would it be possible to generate a core dump of the running process using gcore https://sourceware.org/gdb/onlinedocs/gdb/gcore-man.html ? From the output above it seems it uses really little memory, so a dump should take really a small amount of time (possibly a fraction of second). Thanks
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sysown/proxysql/issues/826#issuecomment-286416977, or mute the thread https://github.com/notifications/unsubscribe-auth/AQmB1O57zQ8Kl_xBoS3TxTGV022njG44ks5rlpLNgaJpZM4LGjqA.
@renecannao this is the dump that I get from
gcore -o proxysql.dump 30931
please rename to tgz and extract
@doddys : thanks a lot. To confirm, this dump is generated while CPU usage was high, or idle? It needs to be while CPU usage is high. Also, can you please let me know exactly which ProxySQL package you have installed? Thanks
@renecannao Yes, it is generated during the cpu ~300% like the top above
# rpm -qa | grep proxy
proxysql-1.3.2-1.1.x86_64
# uname -a
Linux centos-8gb-sgp1-01 2.6.32-642.6.2.el6.x86_64 #1 SMP Wed Oct 26 06:52:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
It happens a few times already but I never pay attention when and how. Let me know know if you need further info. Thanks
Perfect! I started looking into it, but it took me some time to figure out that the version installed is from Percona repository and from GitHub. What version of CentOS are you using exactly? I tried on CentOS 6.7 but seems not the right one. Thank you
@renecannao yes it is from percona repo. I am using
# cat /etc/redhat-release
CentOS release 6.8 (Final)
i just notice that scheduler is invoking duplicate process like below
proxysql 7864 30931 1 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 3 2 1 1 /var/lib/proxysql/galera_hg3.log
proxysql 7866 30931 1 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 4 2 1 1 /var/lib/proxysql/galera_hg4.log
proxysql 7949 7866 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 4 2 1 1 /var/lib/proxysql/galera_hg4.log
proxysql 7953 7864 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 3 2 1 1 /var/lib/proxysql/galera_hg3.log
proxysql 7955 7949 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 4 2 1 1 /var/lib/proxysql/galera_hg4.log
proxysql 7956 7955 0 13:52 ? 00:00:00 timeout 10 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
proxysql 7957 7955 0 13:52 ? 00:00:00 tail -1
proxysql 7958 7956 5 13:52 ? 00:00:00 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
proxysql 7960 7953 0 13:52 ? 00:00:00 /bin/bash /var/lib/proxysql/proxysql_galera_checker_v2.sh 3 2 1 1 /var/lib/proxysql/galera_hg3.log
proxysql 7961 7960 0 13:52 ? 00:00:00 timeout 10 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
proxysql 7962 7960 0 13:52 ? 00:00:00 tail -1
proxysql 7963 7961 4 13:52 ? 00:00:00 mysql -nNE -uproxysql -h 10.130.34.56 -P 3306 -e SHOW STATUS LIKE 'wsrep_local_state'
root 7965 16029 0 13:52 pts/0 00:00:00 grep proxysql
proxysql 30930 1 0 11:23 ? 00:00:00 /usr/bin/proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
proxysql 30931 30930 99 11:23 ? 03:56:50 /usr/bin/proxysql -c /etc/proxysql.cnf -D /var/lib/proxysql
Is this normal?
btw for whatever reason the cpu drop back to normal now. it lasted for about 1 hour
Ok, I will start looking into the dump using 6.8 .
With regards to the duplicate processes, there is no mechanism to limit the number of processes started by scheduler.
What's the output of SELECT * FROM scheduler
?
below are the ouput
mysql> select * from scheduler;
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| 1 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 0 | 0 | 3 | 1 | /var/lib/proxysql/proxysql_galera_checker.log | |
| 2 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 2 | 2 | 3 | 1 | /var/lib/proxysql/galera_hg2.log | |
| 3 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 3 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg3.log | |
| 4 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 4 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg4.log | |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
4 rows in set (0.00 sec)
mysql> select * from runtime_scheduler;
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
| 1 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 0 | 0 | 3 | 1 | /var/lib/proxysql/proxysql_galera_checker.log | |
| 2 | 0 | 10000 | /usr/bin/proxysql_galera_checker | 2 | 2 | 3 | 1 | /var/lib/proxysql/galera_hg2.log | |
| 3 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 3 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg3.log | |
| 4 | 1 | 10000 | /var/lib/proxysql/proxysql_galera_checker_v2.sh | 4 | 2 | 1 | 1 | /var/lib/proxysql/galera_hg4.log | |
+----+--------+-------------+-------------------------------------------------+------+------+------+------+-----------------------------------------------+---------+
4 rows in set (0.00 sec)
as you can see I have 2 active schedulers, but it spawn more than 1 process per scheduler at the same time.
maybe it is not spawning more than 1 process per scheduler at the same time.
It should spawn a process ever 10 seconds, and check the backends.
The fact that from ps
you can see two checks against 10.130.34.56
is curious: a check should be very very fast, so if you manage to catch two checks against the same hosts maybe the checks are being slow, and a scheduled process takes longer than 10 seconds to complete, and therefore there is an overlap.
The check is fast. The script finished within 1 second. the weird thing eventhough two process running the same script, I can only see a single set of log entry
Tue Mar 14 14:22:04 UTC 2017 ###### proxysql_galera_checker.sh SUMMARY ######
Tue Mar 14 14:22:04 UTC 2017 Hostgroup writers 4
Tue Mar 14 14:22:04 UTC 2017 Hostgroup readers 2
Tue Mar 14 14:22:04 UTC 2017 Number of writers 1
Tue Mar 14 14:22:04 UTC 2017 Writers are readers 1
Tue Mar 14 14:22:04 UTC 2017 log file /var/lib/proxysql/galera_hg4.log
Tue Mar 14 14:22:05 UTC 2017 ###### HANDLE WRITER NODES ######
Tue Mar 14 14:22:05 UTC 2017 --> Checking WRITE server 4:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 4:10.130.34.56:3306 is already ONLINE: 1 of 1 write nodes
Tue Mar 14 14:22:05 UTC 2017 --> Checking WRITE server 4:10.130.33.225:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 4:10.130.33.225:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:05 UTC 2017 --> Checking WRITE server 4:10.130.33.238:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 4:10.130.33.238:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:05 UTC 2017 ###### HANDLE READER NODES ######
Tue Mar 14 14:22:05 UTC 2017 --> Checking READ server 2:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 2:10.130.34.56:3306 is already ONLINE
Tue Mar 14 14:22:05 UTC 2017 --> Checking READ server 2:10.130.33.225:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 2:10.130.33.225:3306 is already ONLINE
Tue Mar 14 14:22:05 UTC 2017 --> Checking READ server 2:10.130.33.238:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:05 UTC 2017 server 2:10.130.33.238:3306 is already ONLINE
Tue Mar 14 14:22:05 UTC 2017 ###### SUMMARY ######
Tue Mar 14 14:22:05 UTC 2017 --> Number of writers that are 'ONLINE': 1 : hostgroup: 4
Tue Mar 14 14:22:05 UTC 2017 --> Number of readers that are 'ONLINE': 3 : hostgroup: 2
Tue Mar 14 14:22:05 UTC 2017 ###### Loading mysql_servers config into runtime ######
Tue Mar 14 14:22:15 UTC 2017 ###### proxysql_galera_checker.sh SUMMARY ######
Tue Mar 14 14:22:15 UTC 2017 Hostgroup writers 4
Tue Mar 14 14:22:15 UTC 2017 Hostgroup readers 2
Tue Mar 14 14:22:15 UTC 2017 Number of writers 1
Tue Mar 14 14:22:15 UTC 2017 Writers are readers 1
Tue Mar 14 14:22:15 UTC 2017 log file /var/lib/proxysql/galera_hg4.log
Tue Mar 14 14:22:15 UTC 2017 ###### HANDLE WRITER NODES ######
Tue Mar 14 14:22:15 UTC 2017 --> Checking WRITE server 4:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 4:10.130.34.56:3306 is already ONLINE: 1 of 1 write nodes
Tue Mar 14 14:22:15 UTC 2017 --> Checking WRITE server 4:10.130.33.225:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 4:10.130.33.225:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:15 UTC 2017 --> Checking WRITE server 4:10.130.33.238:3306, current status OFFLINE_SOFT, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 4:10.130.33.238:3306 is already OFFLINE_SOFT, max write nodes reached (1)
Tue Mar 14 14:22:15 UTC 2017 ###### HANDLE READER NODES ######
Tue Mar 14 14:22:15 UTC 2017 --> Checking READ server 2:10.130.34.56:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 2:10.130.34.56:3306 is already ONLINE
Tue Mar 14 14:22:15 UTC 2017 --> Checking READ server 2:10.130.33.225:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 2:10.130.33.225:3306 is already ONLINE
Tue Mar 14 14:22:15 UTC 2017 --> Checking READ server 2:10.130.33.238:3306, current status ONLINE, wsrep_local_state 4
Tue Mar 14 14:22:15 UTC 2017 server 2:10.130.33.238:3306 is already ONLINE
Tue Mar 14 14:22:15 UTC 2017 ###### SUMMARY ######
Tue Mar 14 14:22:15 UTC 2017 --> Number of writers that are 'ONLINE': 1 : hostgroup: 4
Tue Mar 14 14:22:15 UTC 2017 --> Number of readers that are 'ONLINE': 3 : hostgroup: 2
Tue Mar 14 14:22:15 UTC 2017 ###### Loading mysql_servers config into runtime ######
@doddys : I couldn't yet find anything odd in the dump. It seems that the instance is indeed idle:
What bugs me is servers_table_version = 2544
: that should means that LOAD MYSQL SERVERS TO RUNTIME
was executed 2543 times, or about once every 2.72 seconds.
Can you please share proxysql's error log? (/var/lib/proxysql/proxysql.log
)
I would like to dig into it.
Thank you
@renecannao I did notice that there are a lot of reload in the proxysql.log but I am not sure it was normal or not.
please rename to tgz and extract.
@doddys : there is a bug in proxysql_galera_checker.sh
, because LOAD MYSQL SERVERS TO RUNTIME
is always called even if no changes are performed.
Speaking with @geotro few days ago, I think we have a solution for this.
Calling LOAD MYSQL SERVERS TO RUNTIME
has some costs, and it is possible that calling it too often has unexpected side effect. I will investigate more.
Thank you for the file!
@doddys , interesting finding!
Number of LOAD MYSQL SERVERS
per hour:
$ egrep '^2017-03-14 .*LOAD MYSQL SERVERS' proxysql.log | cut -b 1-13 | sort | uniq -c
714 2017-03-14 00
706 2017-03-14 01
704 2017-03-14 02
704 2017-03-14 03
706 2017-03-14 04
702 2017-03-14 05
704 2017-03-14 06
704 2017-03-14 07
842 2017-03-14 08
1408 2017-03-14 09
1416 2017-03-14 10
1408 2017-03-14 11
1350 2017-03-14 12
1011 2017-03-14 13
708 2017-03-14 14
706 2017-03-14 15
710 2017-03-14 16
710 2017-03-14 17
714 2017-03-14 18
712 2017-03-14 19
712 2017-03-14 20
714 2017-03-14 21
714 2017-03-14 22
74 2017-03-14 23
From the above, it seems that something happens between 8am and 9am, and something changed again between 1pm and 2pm. This is indeed the case:
$ egrep -v '^HID' proxysql.log | egrep -v '(Received LOAD MYSQL SERVERS TO RUNTIME command|New mysql_replication_hostgroups table)' | egrep '^2017-03-14.*INFO'
2017-03-14 08:47:18 [INFO] Received save scheduler from mem command
2017-03-14 08:47:25 [INFO] Received load scheduler from mem command
2017-03-14 08:47:36 [INFO] Received save mysql servers from mem command
2017-03-14 08:47:47 [INFO] Received load mysql servers from mem command
2017-03-14 08:49:46 [INFO] Received save mysql users from mem command
2017-03-14 08:49:59 [INFO] Received load mysql users from mem command
2017-03-14 08:51:18 [INFO] Received load mysql query rules from mem command
2017-03-14 08:51:29 [INFO] Received save mysql query rules from mem command
2017-03-14 08:54:19 [INFO] Received load scheduler from mem command
2017-03-14 08:54:24 [INFO] Received save scheduler from mem command
2017-03-14 08:56:42 [INFO] Received load scheduler from mem command
2017-03-14 08:56:47 [INFO] Received save scheduler from mem command
2017-03-14 08:59:38 [INFO] Received load scheduler from mem command
2017-03-14 08:59:41 [INFO] Received save scheduler from mem command
2017-03-14 11:23:31 [INFO] Starting shutdown...
2017-03-14 11:23:32 [INFO] Shutdown completed!
2017-03-14 11:23:32 [INFO] Exiting...
2017-03-14 11:23:32 [INFO] Shutdown angel process
2017-03-14 11:23:32 [INFO] Starting ProxySQL
2017-03-14 11:23:32 [INFO] Sucessfully started
2017-03-14 11:23:32 [INFO] Angel process started ProxySQL process 30931
2017-03-14 12:51:55 [INFO] Received load scheduler from mem command
2017-03-14 12:52:08 [INFO] Received save scheduler to mem command
2017-03-14 13:52:14 [INFO] Received load scheduler from mem command
2017-03-14 13:52:19 [INFO] Received save scheduler from mem command
Scheduler entries changed:
ProxySQL was restarted at 11:23:32 , that means that the core dump you sent was generated at 13:19:02 (6930 seconds after 11:23:32).
In https://github.com/sysown/proxysql/issues/826#issuecomment-286429201 you commented that :
This is actually very interesting, because commands related to the scheduler were executed at 12:52ish and 13:52ish : exactly one hour apart.
Do you recall:
a) what did you change in the scheduler at these times?
b) if you run ps
before of after running the commands on the scheduler?
Do you mind sharing the files /var/lib/proxysql/galera_hg*.log ? I would like to see if I can find anything else interesting during this time window.
Looking at the number of LOAD MYSQL SERVERS
per hour I actually missed a detail, and I should have checked more granularity, using:
egrep '^2017-03-14 1[1234].*LOAD MYSQL SERVERS' proxysql.log | cut -b 1-16 | sort | uniq -c
The number of LOAD MYSQL SERVERS
per minute:
@doddys : can you please try this modified version of proxysql_galera_checker.sh ? proxysql_galera_checker-v2.sh.zip
@taylor840326 , @aydancoskun : are you also using proxysql_galera_checker.sh ?
@renecannao I did make some config changes after proxy restart at 11 ish ...
I applied the script that your provided. I notice a few things differ from previous behavior
mysql: [Warning] Using a password on the command line interface can be insecure.
# watch -n 1 'mysql --login-path=proxysql -t -e "select * from stats_mysql_connection_pool where hostgroup in (2,3,4,94,92,93) order by hostgroup,srv_host ;" -e " select hostgroup_id,hostname,status,weight,comment from mysql_servers where hostgroup_id in (2,3,4) order by hostgroup_id,hostname ;"'
Every 1.0s: mysql --login-path=proxysql -t -e "select * from stats_mysql_connection_pool where hostgroup in (2,3,4,94,92,93) order by hostgrou... Wed Mar 15 02:23:21 2017
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 2 | 10.130.33.225 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 205 |
| 2 | 10.130.33.238 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1384 |
| 2 | 10.130.34.56 | 3306 | ONLINE | 0 | 13 | 13 | 0 | 19678 | 519672 | 186768 | 447 |
| 3 | 10.130.33.225 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 205 |
| 3 | 10.130.33.238 | 3306 | ONLINE | 200 | 0 | 594 | 0 | 3806 | 460314 | 2670117 | 1384 |
| 3 | 10.130.34.56 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 447 |
| 4 | 10.130.33.225 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 205 |
| 4 | 10.130.33.238 | 3306 | OFFLINE_SOFT | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1384 |
| 4 | 10.130.34.56 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 447 |
+-----------+---------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
+--------------+---------------+--------------+------------+-----------+
| hostgroup_id | hostname | status | weight | comment |
+--------------+---------------+--------------+------------+-----------+
| 2 | 10.130.33.225 | ONLINE | 1 | PRIMARY |
| 2 | 10.130.33.238 | ONLINE | 1 | BACKUP |
| 2 | 10.130.34.56 | ONLINE | 1000000000 | BACKUP |
| 3 | 10.130.33.225 | ONLINE | 1 | BACKUP |
| 3 | 10.130.33.238 | OFFLINE_SOFT | 1000000000 | PRIMARY |
| 3 | 10.130.34.56 | OFFLINE_SOFT | 1 | BACKUP |
| 4 | 10.130.33.225 | ONLINE | 100000 | SECONDARY |
| 4 | 10.130.33.238 | OFFLINE_SOFT | 1 | BACKUP |
| 4 | 10.130.34.56 | OFFLINE_SOFT | 100000000 | PRIMARY |
+--------------+---------------+--------------+------------+-----------+
attached is the galera_hg logs
anyway, if the cpu spike again, is there anything you would like me to capture?
Proxysql CPU usage greatly increase in 24 hours with only 2 connections active. Server is RHEL 6.8 running 2.6.32-642.4.2.el6.x86_64 kernel running proxysql-1.3.0h-1.x86_64