sysown / proxysql

High-performance MySQL proxy with a GPL license.
http://www.proxysql.com
GNU General Public License v3.0
5.98k stars 974 forks source link

ProxySQL crashed while querying stats_mysql_query_digest_reset table #3689

Closed bc-ashokmahajan closed 2 years ago

bc-ashokmahajan commented 2 years ago

Today one of our production proxysql node crashed while querying stats_mysql_query_digest_reset, which we were executing after a long time. It resulted into errors to the end users.

2021-11-09 23:42:44 MySQL_Thread.cpp:3751:process_all_sessions(): [WARNING] Closing unhealthy client connection 10.128.1.49:36614
2021-11-09 23:42:58 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
2021-11-09 23:43:01 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
Tue Nov  9 23:43:03 UTC 2021: update_mysql_servers.sh: Unable to acquire lock, Aborting.
2021-11-09 23:43:04 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
2021-11-09 23:43:07 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
Tue Nov  9 23:43:08 UTC 2021: update_mysql_servers.sh: Unable to acquire lock, Aborting.
Terminated
Tue Nov  9 23:43:10 UTC 2021: update_mysql_servers.sh: SIGTERM (15) received due to timeout, terminated.
2021-11-09 23:43:10 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
2021-11-09 23:43:13 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
2021-11-09 23:43:16 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
Tue Nov  9 23:43:18 UTC 2021: update_mysql_servers.sh: Unable to acquire lock, Aborting.
2021-11-09 23:43:19 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
2021-11-09 23:43:22 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
Tue Nov  9 23:43:24 UTC 2021: update_mysql_servers.sh: Unable to acquire lock, Aborting.
Terminated
Tue Nov  9 23:43:25 UTC 2021: update_mysql_servers.sh: SIGTERM (15) received due to timeout, terminated.
2021-11-09 23:43:25 main.cpp:1541:main(): [ERROR] Watchdog: 22 threads missed a heartbeat
2021-11-09 23:43:25 main.cpp:1545:main(): [ERROR] Watchdog: reached 10 missed heartbeats. Aborting!
2021-11-09 23:43:25 main.cpp:1546:main(): [ERROR] Watchdog: see details at https://github.com/sysown/proxysql/wiki/Watchdog
proxysql: main.cpp:1547: int main(int, const char**): Assertion `0' failed.
Error: signal 6:
/usr/bin/proxysql(_Z13crash_handleri+0x2a)[0x564d86da5e1a]
/lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7fe4ff84d840]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fe4ff84d7bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fe4ff838535]
/lib/x86_64-linux-gnu/libc.so.6(+0x2240f)[0x7fe4ff83840f]
/lib/x86_64-linux-gnu/libc.so.6(+0x30102)[0x7fe4ff846102]
/usr/bin/proxysql(main+0xed7)[0x564d86d89e17]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fe4ff83a09b]
/usr/bin/proxysql(_start+0x2a)[0x564d86d9c82a]
 ---- /usr/bin/proxysql(_Z13crash_handleri+0x2a) [0x564d86da5e1a] : crash_handler(int)
2021-11-09 23:43:35 main.cpp:1245:ProxySQL_daemonize_phase3(): [ERROR] ProxySQL crashed. Restarting!
2021-11-09 23:43:35 [INFO] ProxySQL version 2.3.2-10-g8cd66cf
2021-11-09 23:43:35 [INFO] ProxySQL SHA1 checksum: edcf1550b51dc1f5443955ece1828cfbe7e53a82
2021-11-09 23:43:35 [INFO] Angel process started ProxySQL process 9380
2021-11-09 23:43:35 [INFO] Loaded built-in SQLite3
Standard ProxySQL MySQL Logger rev. 2.0.0714 -- MySQL_Logger.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL Cluster rev. 0.4.0906 -- ProxySQL_Cluster.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL Statistics rev. 1.4.1027 -- ProxySQL_Statistics.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL HTTP Server Handler rev. 1.4.1031 -- ProxySQL_HTTP_Server.cpp -- Thu Sep 30 21:22:46 2021
Standard ProxySQL Admin rev. 2.0.6.0805 -- ProxySQL_Admin.cpp -- Thu Sep 30 21:22:46 2021
2021-11-09 23:43:35 [INFO] ProxySQL SHA1 checksum: edcf1550b51dc1f5443955ece1828cfbe7e53a82
Standard MySQL Threads Handler rev. 0.2.0902 -- MySQL_Thread.cpp -- Thu Sep 30 21:22:46 2021
Standard MySQL Authentication rev. 0.2.0902 -- MySQL_Authentication.cpp -- Thu Sep 30 21:22:46 2021

(Messages related to update_mysql_servers.sh above are from out scheduler job, so please ignore it.)

Proxysql version:

2.3.2-10-g8cd66cf

The package used to install ProxySQL

We installed it from proxysql repo: https://repo.proxysql.com/ProxySQL/proxysql-2.3.x/buster/

Detail of OS:

Linux 4.19.0-17-cloud-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux

Core dump, proxysql.log and systems.log:

We would like to share it securely. I sent a dropbox link to proxysql support.

H/W details:

Complete hw information also included in the dropbox folder.


lshw
    description: Computer
    product: Google Compute Engine
    vendor: Google
    serial: GoogleCloud-94004F78FB3A85604241557E17EBA5D7
    width: 64 bits
    capabilities: smbios-2.4 dmi-2.4 smp vsyscall32
    configuration: boot=normal uuid=94004F78-FB3A-8560-4241-557E17EBA5D7
  *-core
       description: Motherboard
       product: Google Compute Engine
       vendor: Google
       physical id: 0
       serial: Board-GoogleCloud-94004F78FB3A85604241557E17EBA5D7
     *-firmware
          description: BIOS
          vendor: Google
          physical id: 0
          version: Google
          date: 01/01/2011
          size: 96KiB
     *-cpu:0
          description: CPU
          product: Intel(R) Xeon(R) CPU @ 2.30GHz
          vendor: Intel Corp.
          physical id: 1001
          bus info: cpu@0
          slot: CPU 1
          size: 2GHz
          capacity: 2GHz
          width: 64 bits
          capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp x86-64 constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities
     *-cpu:1
          description: CPU
          vendor: Google
          physical id: 1002
          bus info: cpu@1
          slot: CPU 2
          size: 2GHz
          capacity: 2GHz
     *-cpu:2
          description: CPU
          vendor: Google
          physical id: 1003
          bus info: cpu@2
          slot: CPU 3
          size: 2GHz
          capacity: 2GHz
     *-cpu:3
          description: CPU
          vendor: Google
          physical id: 1004
          bus info: cpu@3
          slot: CPU 4
          size: 2GHz
          capacity: 2GHz
     *-cpu:4
          description: CPU
          vendor: Google
          physical id: 1005
          bus info: cpu@4
          slot: CPU 5
          size: 2GHz
          capacity: 2GHz
     *-cpu:5
          description: CPU
          vendor: Google
          physical id: 1006
          bus info: cpu@5
          slot: CPU 6
          size: 2GHz
          capacity: 2GHz
     *-cpu:6
          description: CPU
          vendor: Google
          physical id: 1007
          bus info: cpu@6
          slot: CPU 7
          size: 2GHz
          capacity: 2GHz
     *-cpu:7
          description: CPU
          vendor: Google
          physical id: 1008
          bus info: cpu@7
          slot: CPU 8
          size: 2GHz
          capacity: 2GHz
     *-cpu:8
          description: CPU
          vendor: Google
          physical id: 1009
          bus info: cpu@8
          slot: CPU 9
          size: 2GHz
          capacity: 2GHz
     *-cpu:9
          description: CPU
          vendor: Google
          physical id: 100a
          bus info: cpu@9
          slot: CPU a
          size: 2GHz
          capacity: 2GHz
     *-cpu:10
          description: CPU
          vendor: Google
          physical id: 100b
          bus info: cpu@10
          slot: CPU b
          size: 2GHz
          capacity: 2GHz
     *-cpu:11
          description: CPU
          vendor: Google
          physical id: 100c
          bus info: cpu@11
          slot: CPU c
          size: 2GHz
          capacity: 2GHz
     *-cpu:12
          description: CPU
          vendor: Google
          physical id: 100d
          bus info: cpu@12
          slot: CPU d
          size: 2GHz
          capacity: 2GHz
     *-cpu:13
          description: CPU
          vendor: Google
          physical id: 100e
          bus info: cpu@13
          slot: CPU e
          size: 2GHz
          capacity: 2GHz
     *-cpu:14
          description: CPU
          vendor: Google
          physical id: 100f
          bus info: cpu@14
          slot: CPU f
          size: 2GHz
          capacity: 2GHz
     *-cpu:15
          description: CPU
          vendor: Google
          physical id: 1010
          bus info: cpu@15
          slot: CPU10
          size: 2GHz
          capacity: 2GHz
     *-cpu:16
          description: CPU
          vendor: Google
          physical id: 1011
          bus info: cpu@16
          slot: CPU11
          size: 2GHz
          capacity: 2GHz
     *-cpu:17
          description: CPU
          vendor: Google
          physical id: 1012
          bus info: cpu@17
          slot: CPU12
          size: 2GHz
          capacity: 2GHz
     *-cpu:18
          description: CPU
          vendor: Google
          physical id: 1013
          bus info: cpu@18
          slot: CPU13
          size: 2GHz
          capacity: 2GHz
     *-cpu:19
          description: CPU
          vendor: Google
          physical id: 1014
          bus info: cpu@19
          slot: CPU14
          size: 2GHz
          capacity: 2GHz
     *-cpu:20
          description: CPU
          vendor: Google
          physical id: 1015
          bus info: cpu@20
          slot: CPU15
          size: 2GHz
          capacity: 2GHz
     *-cpu:21
          description: CPU
          vendor: Google
          physical id: 1016
          bus info: cpu@21
          slot: CPU16
          size: 2GHz
          capacity: 2GHz
     *-cpu:22
          description: CPU
          vendor: Google
          physical id: 1017
          bus info: cpu@22
          slot: CPU17
          size: 2GHz
          capacity: 2GHz
     *-cpu:23
          description: CPU
          vendor: Google
          physical id: 1018
          bus info: cpu@23
          slot: CPU18
          size: 2GHz
          capacity: 2GHz
     *-cpu:24
          description: CPU
          vendor: Google
          physical id: 1019
          bus info: cpu@24
          slot: CPU19
          size: 2GHz
          capacity: 2GHz
     *-cpu:25
          description: CPU
          vendor: Google
          physical id: 101a
          bus info: cpu@25
          slot: CPU1a
          size: 2GHz
          capacity: 2GHz
     *-cpu:26
          description: CPU
          vendor: Google
          physical id: 101b
          bus info: cpu@26
          slot: CPU1b
          size: 2GHz
          capacity: 2GHz
     *-cpu:27
          description: CPU
          vendor: Google
          physical id: 101c
          bus info: cpu@27
          slot: CPU1c
          size: 2GHz
          capacity: 2GHz
     *-cpu:28
          description: CPU
          vendor: Google
          physical id: 101d
          bus info: cpu@28
          slot: CPU1d
          size: 2GHz
          capacity: 2GHz
     *-cpu:29
          description: CPU
          vendor: Google
          physical id: 101e
          bus info: cpu@29
          slot: CPU1e
          size: 2GHz
          capacity: 2GHz
     *-cpu:30
          description: CPU
          vendor: Google
          physical id: 101f
          bus info: cpu@30
          slot: CPU1f
          size: 2GHz
          capacity: 2GHz
     *-cpu:31
          description: CPU
          vendor: Google
          physical id: 1020
          bus info: cpu@31
          slot: CPU20
          size: 2GHz
          capacity: 2GHz
     *-memory
          description: System Memory
          physical id: 200
          size: 120GiB
          capabilities: ecc
          configuration: errordetection=multi-bit-ecc
        *-bank:0
             description: DIMM RAM Synchronous
             physical id: 0
             slot: DIMM 0
             size: 16GiB
             width: 64 bits
        *-bank:1
             description: DIMM RAM Synchronous
             physical id: 1
             slot: DIMM 1
             size: 16GiB
             width: 64 bits
        *-bank:2
             description: DIMM RAM Synchronous
             physical id: 2
             slot: DIMM 2
             size: 16GiB
             width: 64 bits
        *-bank:3
             description: DIMM RAM Synchronous
             physical id: 3
             slot: DIMM 3
             size: 16GiB
             width: 64 bits
        *-bank:4
             description: DIMM RAM Synchronous
             physical id: 4
             slot: DIMM 4
             size: 16GiB
             width: 64 bits
        *-bank:5
             description: DIMM RAM Synchronous
             physical id: 5
             slot: DIMM 5
             size: 16GiB
             width: 64 bits
        *-bank:6
             description: DIMM RAM Synchronous
             physical id: 6
             slot: DIMM 6
             size: 16GiB
             width: 64 bits
        *-bank:7
             description: DIMM RAM Synchronous
             physical id: 7
             slot: DIMM 7
             size: 8GiB
             width: 64 bits
     *-pci
          description: Host bridge
          product: 440FX - 82441FX PMC [Natoma]
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 02
          width: 32 bits
          clock: 33MHz
        *-isa
             description: ISA bridge
             product: 82371AB/EB/MB PIIX4 ISA
             vendor: Intel Corporation
             physical id: 1
             bus info: pci@0000:00:01.0
             version: 03
             width: 32 bits
             clock: 33MHz
             capabilities: isa bus_master
             configuration: latency=0
        *-bridge UNCLAIMED
             description: Bridge
             product: 82371AB/EB/MB PIIX4 ACPI
             vendor: Intel Corporation
             physical id: 1.3
             bus info: pci@0000:00:01.3
             version: 03
             width: 32 bits
             clock: 33MHz
             capabilities: bridge bus_master
             configuration: latency=0
        *-generic:0
             description: Non-VGA unclassified device
             product: Virtio SCSI
             vendor: Red Hat, Inc
             physical id: 3
             bus info: pci@0000:00:03.0
             version: 00
             width: 32 bits
             clock: 33MHz
             capabilities: msix bus_master cap_list
             configuration: driver=virtio-pci latency=0
             resources: irq:11 ioport:c040(size=64) memory:c0001000-c000107f
           *-virtio0
                description: Virtual I/O device
                physical id: 0
                bus info: virtio@0
                logical name: scsi0
                configuration: driver=virtio_scsi
              *-disk
                   description: SCSI Disk
                   product: PersistentDisk
                   vendor: Google
                   physical id: 0.1.0
                   bus info: scsi@0:0.1.0
                   logical name: /dev/sda
                   version: 1
                   size: 150GiB (161GB)
                   capabilities: gpt-1.00 partitioned partitioned:gpt
                   configuration: ansiversion=6 guid=86f27954-a350-8744-a5ad-a7238ad50853 logicalsectorsize=512 sectorsize=4096
                 *-volume:0
                      description: EXT4 volume
                      vendor: Linux
                      physical id: 1
                      bus info: scsi@0:0.1.0,1
                      logical name: /dev/sda1
                      logical name: /
                      version: 1.0
                      serial: ece608be-d123-4838-8ffb-ddd0a9d4bfd5
                      size: 149GiB
                      capacity: 149GiB
                      capabilities: journaled extended_attributes large_files huge_files dir_nlink recover 64bit extents ext4 ext2 initialized
                      configuration: created=2021-05-11 20:43:07 filesystem=ext4 lastmountpoint=/ modified=2021-09-23 18:28:37 mount.fstype=ext4 mount.options=rw,relatime,discard,errors=remount-ro mounted=2021-09-23 18:28:44 state=mounted
                 *-volume:1
                      description: BIOS Boot partition
                      vendor: EFI
                      physical id: e
                      bus info: scsi@0:0.1.0,14
                      logical name: /dev/sda14
                      serial: f4160e31-891e-1440-a2d9-9ee32faa2747
                      capacity: 3071KiB
                      capabilities: nofs
                 *-volume:2 UNCLAIMED
                      description: Windows FAT volume
                      vendor: mkfs.fat
                      physical id: f
                      bus info: scsi@0:0.1.0,15
                      version: FAT16
                      serial: c5bb-3d66
                      size: 123MiB
                      capacity: 123MiB
                      capabilities: boot fat initialized
                      configuration: FATs=2 filesystem=fat
        *-network
             description: Ethernet controller
             product: Virtio network device
             vendor: Red Hat, Inc
             physical id: 4
             bus info: pci@0000:00:04.0
             version: 00
             width: 32 bits
             clock: 33MHz
             capabilities: msix bus_master cap_list
             configuration: driver=virtio-pci latency=0
             resources: irq:10 ioport:c000(size=64) memory:c0000000-c00007ff
           *-virtio1
                description: Ethernet interface
                physical id: 0
                bus info: virtio@1
                logical name: ens4
                serial: 42:01:0a:80:00:06
                capabilities: ethernet physical
                configuration: autonegotiation=off broadcast=yes driver=virtio_net driverversion=1.0.0 ip=10.128.0.6 link=yes multicast=yes
        *-generic:1
             description: Unclassified device
             product: Virtio RNG
             vendor: Red Hat, Inc
             physical id: 5
             bus info: pci@0000:00:05.0
             version: 00
             width: 32 bits
             clock: 33MHz
             capabilities: msix bus_master cap_list
             configuration: driver=virtio-pci latency=0
             resources: irq:10 ioport:c080(size=32) memory:c0002000-c000203f
           *-virtio2 UNCLAIMED
                description: Virtual I/O device
                physical id: 0
                bus info: virtio@2
                configuration: driver=virtio_rng
     *-pnp00:00
          product: PnP device PNP0b00
          physical id: 1
          capabilities: pnp
          configuration: driver=rtc_cmos
     *-pnp00:01
          product: PnP device PNP0303
          physical id: 2
          capabilities: pnp
          configuration: driver=i8042 kbd
     *-pnp00:02
          product: PnP device PNP0f13
          physical id: 3
          capabilities: pnp
          configuration: driver=i8042 aux
     *-pnp00:03
          product: PnP device PNP0501
          physical id: 4
          capabilities: pnp
          configuration: driver=serial
     *-pnp00:04
          product: PnP device PNP0501
          physical id: 5
          capabilities: pnp
          configuration: driver=serial
     *-pnp00:05
          product: PnP device PNP0501
          physical id: 6
          capabilities: pnp
          configuration: driver=serial
     *-pnp00:06
          product: PnP device PNP0501
          physical id: 7
          capabilities: pnp
          configuration: driver=serial```
bc-ashokmahajan commented 2 years ago

I am not sure if it is directly related to above issue, we have consistently seen that the memory footprint of the proxysql process gradually increase over the time and it reduces only when proxysql instance is restarted. Following are grafana graphs of the two memory related metrics for the period of last 10 days. Screen Shot 2021-11-09 at 4 12 30 PM

bc-ashokmahajan commented 2 years ago

Same graph for last 1 on hour after the restart: Screen Shot 2021-11-09 at 4 25 21 PM

renecannao commented 2 years ago

Your application seems to generate a lot of unique queries, this the memory structure behind stats_mysql_query_digest is really large. Querying that table causes a lot of processing, and ProxySQL is aborting because it was processing data for really a long period of time, and it is configured to abort if threads make no progress for 30 seconds. Therefore it aborted as configured.