prometheus / mysqld_exporter

Exporter for MySQL server metrics
http://prometheus.io/
Apache License 2.0
2.12k stars 745 forks source link

Scrape queries are logged to slow log - add long_query_time start parameter #775

Open BoemmLA opened 11 months ago

BoemmLA commented 11 months ago

In our setup we are running several databases with quite fast queries and mostly very less slow queries from the application. Thats why we setted the long_query_time for all (150) database (clusters) to 0.01 which is a fine value for our usual work load.

Unfortunately the mysqld_exporter runs queries which takes longer then 0.01 second, sometime more then 1 seconds and so we have slow logs logged for those scrape queries, sometimes the exporter queries are the only slows in the log. This is really annoying since this gives us a not really good monitoring of slow queries where we only would like to see statistics about slow queries coming from application users but not from metrics scraping.

There is the flag exporter.log_slow_filter, but this one only filters out the "non tmp disk table" queries. All queries writing tmp disk tables are still logged to the slow log.

And this tmp disk table queries are more then the non tmp disk table queries, so the existing flag has some effect, but not reaches the goal to avoid half of the queries to be logged.

Long story short, a dynamic setting for long_query_time as a exporter parameter would solve this problem. When it would be possible to set the long_query_time up to 10 seconds for the exporter sessions, this problem should be fixed.

Can you please add such a parameter to define a long_query_time value, which is then used for the mysql session the exporter uses?

The would be great!

Some code examples: Here you see a half day slow query log ... there are only scrape queries in (at least the top 4):

# Overall: 11.12k total, 41 unique, 0.28 QPS, 0.01x concurrency __________
# Time range: 2023-10-12T00:00:13 to 2023-10-12T11:12:02
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time           356s    10ms   190ms    32ms    68ms    20ms    23ms
# Lock time            35s       0    95ms     3ms    21ms     8ms    98us
# Rows sent          1.70M       0     648  160.09  381.65  167.65   97.36
# Rows examine      12.41M       0   3.11k   1.14k   3.04k   1.28k  381.65
# Rows affecte           0       0       0       0       0       0       0
# Bytes sent        98.64M      11  53.75k   9.08k  19.40k   9.15k   6.96k
# Merge passes           0       0       0       0       0       0       0
# Tmp tables         1.63M       0     384  153.39  381.65  158.39  107.34
# Tmp disk tbl     322.99k       0      73   29.74   72.65   30.03   23.65
# Tmp tbl size       2.59G       0   3.21M 243.92k 590.13k 341.98k 106.99k
# Query size         3.88M      16   1.04k  365.80  537.02  195.23  487.09
# InnoDB:
# IO r bytes             0       0       0       0       0       0       0
# IO r ops               0       0       0       0       0       0       0
# IO r wait              0       0       0       0       0       0       0
# pages distin     133.81k       3      40   19.12   34.95   14.99    5.75
# queue wait             0       0       0       0       0       0       0
# rec lock wai           0       0       0       0       0       0       0
# Boolean:
# Filesort       4% yes,  95% no
# Full join     34% yes,  65% no
# Full scan     99% yes,   0% no
# Tmp table     67% yes,  32% no
# Tmp table on  64% yes,  35% no

# Profile
# Rank Query ID                            Response time  Calls R/Call V/M
# ==== =================================== ============== ===== ====== ===
#    1 0xE7F53B26FE0D399A2DBAFBAD1F732B72  196.0669 55.0%  3359 0.0584  0.00 SELECT information_schema.columns
#    2 0x3A86802FFCCBF26D88901A51719CB50B   75.8887 21.3%  3154 0.0241  0.00 SELECT information_schema.tables
#    3 0x6E905E29D0B3D1EA1A2193E75E153007   64.6393 18.1%  3281 0.0197  0.00 SELECT performance_schema.events_waits_summary_global_by_event_name
#    4 0xC25C941D3A5CD2EC72775047662073FB    5.8551  1.6%   429 0.0136  0.00 SELECT information_schema.SCHEMATA information_schema.TABLES
# MISC 0xMISC                               13.9295  3.9%   897 0.0155   0.0 <37 ITEMS>

The 1st, 2nd and 4th are tmp disk table queries, the 3rd can be avoided when setting exporter.log_slow_filter

# Query 1: 0.08 QPS, 0.00x concurrency, ID 0xE7F53B26FE0D399A2DBAFBAD1F732B72 at byte 147682
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2023-10-12T00:00:13 to 2023-10-12T11:32:43
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         30    3463
# Exec time     55    202s    35ms   190ms    58ms    82ms    14ms    56ms
# Lock time     70     25s   132us    95ms     7ms    31ms    12ms   287us
# Rows sent      0  13.53k       4       4       4       4       0       4
# Rows examine  82  10.51M   3.11k   3.11k   3.11k   3.11k       0   3.11k
# Rows affecte   0       0       0       0       0       0       0       0
# Bytes sent     2   2.08M     631     631     631     631       0     631
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables    75   1.27M     384     384     384     384       0     384
# Tmp disk tbl  74 246.87k      73      73      73      73       0      73
# Tmp tbl size  73   1.96G 592.00k 592.00k 592.00k 592.00k       0 592.00k
# Query size    44   1.79M     542     542     542     542       0     542
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin  86 118.62k      34      40   35.08   34.95    0.15   34.95
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Full join    100% yes,   0% no
# Full scan    100% yes,   0% no
# Tmp table    100% yes,   0% no
# Tmp table on 100% yes,   0% no
# String:
# Databases    information_schema
# Hosts        localhost
# InnoDB trxID 0
# Users        exporter
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms  ################################################################
# 100ms  #
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `information_schema` LIKE 'columns'\G
#    SHOW CREATE TABLE `information_schema`.`columns`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT t.table_schema, t.table_name, column_name, auto_increment,
                  pow(2, case data_type
                    when 'tinyint'   then 7
                    when 'smallint'  then 15
                    when 'mediumint' then 23
                    when 'int'       then 31
                    when 'bigint'    then 63
                    end+(column_type like '% unsigned'))-1 as max_int
                  FROM information_schema.columns c
                  STRAIGHT_JOIN information_schema.tables t
                    ON BINARY t.table_schema = c.table_schema AND BINARY t.table_name = c.table_name
                  WHERE c.extra = 'auto_increment' AND t.auto_increment IS NOT NULL\G

# Query 2: 0.08 QPS, 0.00x concurrency, ID 0x3A86802FFCCBF26D88901A51719CB50B at byte 145890
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2023-10-12T00:00:13 to 2023-10-12T11:32:43
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         28    3248
# Exec time     21     78s    11ms   145ms    24ms    40ms     9ms    22ms
# Lock time     23      8s    55us    49ms     3ms    13ms     5ms    93us
# Rows sent     18 320.02k      14     101  100.89   97.36    2.93   97.36
# Rows examine   2 320.02k      14     101  100.89   97.36    2.93   97.36
# Rows affecte   0       0       0       0       0       0       0       0
# Bytes sent    22  22.07M   1.86k   6.96k   6.96k   6.96k  186.24   6.96k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables    20 354.82k       2     112  111.86  107.34    3.70  107.34
# Tmp disk tbl  22  76.04k       1      24   23.97   23.65    0.79   23.65
# Tmp tbl size  13 354.81M       0 112.00k 111.86k 106.99k   3.75k 106.99k
# Query size    38   1.53M     494     502  494.01  487.09       0  487.09
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin  12  16.81k       5       8    5.30    5.75    0.47    4.96
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Full scan    100% yes,   0% no
# Tmp table    100% yes,   0% no
# Tmp table on  99% yes,   0% no
# String:
# Databases    information_schema
# Hosts        localhost
# InnoDB trxID 0
# Users        exporter
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms  ################################################################
# 100ms  #
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `information_schema` LIKE 'tables'\G
#    SHOW CREATE TABLE `information_schema`.`tables`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT
                    TABLE_SCHEMA,
                    TABLE_NAME,
                    TABLE_TYPE,
                    ifnull(ENGINE, 'NONE') as ENGINE,
                    ifnull(VERSION, '0') as VERSION,
                    ifnull(ROW_FORMAT, 'NONE') as ROW_FORMAT,
                    ifnull(TABLE_ROWS, '0') as TABLE_ROWS,
                    ifnull(DATA_LENGTH, '0') as DATA_LENGTH,
                    ifnull(INDEX_LENGTH, '0') as INDEX_LENGTH,
                    ifnull(DATA_FREE, '0') as DATA_FREE,
                    ifnull(CREATE_OPTIONS, 'NONE') as CREATE_OPTIONS
                  FROM information_schema.tables
                  WHERE TABLE_SCHEMA = 'sys'\G

# Query 3: 0.08 QPS, 0.00x concurrency, ID 0x6E905E29D0B3D1EA1A2193E75E153007 at byte 4997524
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2023-10-12T00:00:13 to 2023-10-12T10:56:32
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         28    3281
# Exec time     17     65s    11ms    77ms    20ms    33ms     7ms    17ms
# Lock time      0   283ms    17us    10ms    86us    93us   426us    44us
# Rows sent     72   1.25M     398     398     398     398       0     398
# Rows examine   9   1.25M     398     398     398     398       0     398
# Rows affecte   0       0       0       0       0       0       0       0
# Bytes sent    64  64.57M  20.15k  20.15k  20.15k  20.15k       0  20.15k
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     0       0       0       0       0       0       0       0
# Tmp disk tbl   0       0       0       0       0       0       0       0
# Tmp tbl size   0       0       0       0       0       0       0       0
# Query size     8 365.27k     114     114     114     114       0     114
# Boolean:
# Full scan    100% yes,   0% no
# String:
# Databases    information_schema
# Hosts        localhost
# Users        exporter
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms  ################################################################
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `performance_schema` LIKE 'events_waits_summary_global_by_event_name'\G
#    SHOW CREATE TABLE `performance_schema`.`events_waits_summary_global_by_event_name`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT EVENT_NAME, COUNT_STAR, SUM_TIMER_WAIT
          FROM performance_schema.events_waits_summary_global_by_event_name\G

# Query 4: 0.01 QPS, 0.00x concurrency, ID 0xC25C941D3A5CD2EC72775047662073FB at byte 2972268
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2023-10-12T00:02:02 to 2023-10-12T11:32:39
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          3     441
# Exec time      1      6s    10ms    32ms    14ms    20ms     4ms    12ms
# Lock time      0    79ms    65us    13ms   178us   224us   691us   103us
# Rows sent      0     882       2       2       2       2       0       2
# Rows examine   1 149.01k     346     346     346     346       0     346
# Rows affecte   0       0       0       0       0       0       0       0
# Bytes sent     0  62.45k     145     145     145     145       0     145
# Merge passes   0       0       0       0       0       0       0       0
# Tmp tables     2  35.31k      82      82      82      82       0      82
# Tmp disk tbl   1   4.74k      11      11      11      11       0      11
# Tmp tbl size   4 129.21M 300.03k 300.03k 300.03k 300.03k       0 300.03k
# Query size     3 138.67k     322     322     322     322       0     322
# InnoDB:
# IO r bytes     0       0       0       0       0       0       0       0
# IO r ops       0       0       0       0       0       0       0       0
# IO r wait      0       0       0       0       0       0       0       0
# pages distin   1   1.40k       3       6    3.25    3.89    0.55    2.90
# queue wait     0       0       0       0       0       0       0       0
# rec lock wai   0       0       0       0       0       0       0       0
# Boolean:
# Filesort     100% yes,   0% no
# Full join    100% yes,   0% no
# Full scan    100% yes,   0% no
# Tmp table    100% yes,   0% no
# Tmp table on 100% yes,   0% no
# String:
# Databases    information_schema
# Hosts        localhost
# InnoDB trxID 0
# Users        exporter
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms  ################################################################
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `information_schema` LIKE 'SCHEMATA'\G
#    SHOW CREATE TABLE `information_schema`.`SCHEMATA`\G
#    SHOW TABLE STATUS FROM `information_schema` LIKE 'TABLES'\G
#    SHOW CREATE TABLE `information_schema`.`TABLES`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT S.SCHEMA_NAME AS 'schema', COUNT(T.TABLE_NAME) AS tables_count FROM information_schema.SCHEMATA AS S LEFT JOIN information_schema.TABLES AS T ON S.SCHEMA_NAME = T.TABLE_SCHEMA AND T.TABLE_TYPE = 'BASE TABLE' WHERE S.SCHEMA_NAME NOT IN ('information_schema','mysql','performance_schema','sys') GROUP BY 1 WITH ROLLUP\G
SuperQ commented 11 months ago

Sure, adding long_query_time to the session params is a great idea. It should be simple enough to do. Would you be up to adding this feature? You can see the params and related variables here: https://github.com/prometheus/mysqld_exporter/blob/main/collector/exporter.go#L45