ikzelf / zbxdb

Zabbix database monitoring, the easy and extendable way
GNU General Public License v3.0
94 stars 45 forks source link

Zbxdb plugin leading to performance issues in RDS #102

Closed yatharth27 closed 2 months ago

yatharth27 commented 1 year ago

Certain SQL queries in the zbxdb plugin are consuming upto 80-90% memory inside the RDS as reported by my team. Is there a way we can figure out why these certain queries take up such huge chunk of memory n RDS?

ikzelf commented 1 year ago

Normal performance analysis should be done. First find the query and check it with a regular client. Make sure that the object statistics are up to date.

yatharth27 commented 1 year ago

So we have identififed the queries that are taking up 80-90% of the memory. These are the ones below :

SELECT concat('p_ts[',table_schema,',,filesize]'), sum( data_length + index_length + data_free) FROM information_schema.TABLES GROUP BY table_schema union all SELECT concat('p_ts[',table_schema,',,usedbytes]'), sum( data_length + index_length) FROM information_schema.TABLES GROUP BY table_schema

And

SELECT performance_schema . events_waits_summary_global_by_event_name . EVENT_NAME AS events , performance_schema . events_waits_summary_global_by_event_name . COUNT_STAR AS total , performance_schema . events_waits_summary_global_by_event_name . SUM_TIMER_WAIT AS total_latency , performance_schema . events_waits_summary_global_by_event_name . AVG_TIMER_WAIT AS avg_latency , performance_schema . events_waits_summary_global_by_event_name . MAX_TIMER_WAIT AS `

So, can we try to modify these queries and can you confirm that the plugin is going to work the in the same manner it previously did?

ikzelf commented 1 year ago

As long as you make sure that the queries report their results in the same way as before you can use any query that returns the metrics you want. As far as I can see this is for mysql? You can check by running the SQL in a regular client. I found https://bugs.mysql.com/bug.php?id=97935 Does the query allocate all the memory immediately or does it grow after repeating the query?

zbxdb normally tries to use and re-use a single connection since connecting and disconnecting for many databases is a very expensive operation. If a memory leak is playing a role, that leak should be fixed but before that it should be found. The zbxdb.py code itself does not have a leak, after hundreds of thousands iterations on an oracle of postgres database there is no growth of the memory allocation. The own memory size is also monitored by zbxdb and it is in the template so you can check that too.

yatharth27 commented 1 year ago

how to find backend mysql query for the metric name in zabbix_out folder for specific mysql Arora DB ?

ikzelf commented 1 year ago

check the etc/zbxdb_checks/mysql/ folder for the db version you need. The queries are in there.

yatharth27 commented 1 year ago

So, after having a word with the team, the below query is running every 1 hour and that is leading to high memory usage. So, can we schedule this query to run once in 5 days or on the weekends (Staurday/Sunday)?

SELECT concat('p_ts[',table_schema,',,filesize]'), sum( data_length + index_length + data_free) FROM information_schema.TABLES GROUP BY table_schema union all SELECT concat('p_ts[',table_schema,',,usedbytes]'), sum( data_length + index_length) FROM information_schema.TABLES GROUP BY table_schema

ikzelf commented 1 year ago

I am not very familiar with mysql but I think a better solution is to check the optimiser statistics for the information_schema. You can set the interval for the query to a longer time. To do that you add a new chapter to the cfg file for your database make and version where you put that query in. Just read the file and adjust to your likings. Currently it is in: [checks_60m] minutes: 60 p_ts: SELECT concat('p_ts[',table_schema,',,filesize]'), sum( data_length + index_length + data_free) FROM information_schema.TABLES GROUP BY table_schema union all SELECT concat('p_ts[',table_schema,',,usedbytes]'), sum( data_length + index_length) FROM information_schema.TABLES GROUP BY table_schema

nothing prevents you from adding/changing a longer interval. The chapter name is just a name, the minutes: XXXX directive is the one that controls the repeat interval.