home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.58k stars 29.91k forks source link

High CPU load in MariaDB #70309

Closed igogold closed 2 years ago

igogold commented 2 years ago

The problem

Yesterday after daily recorder database purge CPU load of MariaDB go high and stay so more than a day.

MariaDB installed as Supervisor add-on. I connected to database and call "show processlist;". Every time one of process execute SQL command begins with:

SELECT DISTINCT states.attributes_id 
FROM states 
WHERE states.attributes_id IN (563, 564, 565, 566

where numbers in the parentheses every time are different. mysqld process uses about 50% of CPU most of time.

I upgrade HA to 2022.4.х more than a week ago, so I think it's not database conversion after migration from 2022.3.x.

What version of Home Assistant Core has the issue?

core-2022.4.5

What was the last working version of Home Assistant Core?

core-2022.4.4

What type of installation are you running?

Home Assistant OS

Integration causing the issue

recorder

Link to integration documentation on our website

No response

Diagnostics information

No response

Example YAML snippet

# Use MariaDB as internal database
# Limit storage age and num of entities
recorder:
  purge_keep_days: 7
  db_url: mysql://homeassistant:password@core-mariadb/homeassistant?charset=utf8

Anything in the logs that might be useful for us?

No response

Additional information

CPU usage graph.

grafana graph of CPU usage
bdraco commented 2 years ago

Can you provide a dump of the schema? It seems like you are missing an index on the attributes_id

igogold commented 2 years ago

@bdraco yes, no problem. Schema much smaller than full dump :)

-- MariaDB dump 10.19  Distrib 10.6.4-MariaDB, for Linux (x86_64)
--
-- Host: core-mariadb    Database: homeassistant
-- ------------------------------------------------------
-- Server version   10.4.19-MariaDB

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8mb4 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;

--
-- Table structure for table `events`
--

DROP TABLE IF EXISTS `events`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `events` (
  `event_id` int(11) NOT NULL AUTO_INCREMENT,
  `event_type` varchar(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `event_data` longtext COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `origin` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `time_fired` datetime(6) DEFAULT NULL,
  `created` datetime(6) DEFAULT NULL,
  `context_id` varchar(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `context_user_id` varchar(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `context_parent_id` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`event_id`),
  KEY `ix_events_time_fired` (`time_fired`),
  KEY `ix_events_context_id` (`context_id`),
  KEY `ix_events_context_user_id` (`context_user_id`),
  KEY `ix_events_context_parent_id` (`context_parent_id`),
  KEY `ix_events_event_type_time_fired` (`event_type`,`time_fired`)
) ENGINE=InnoDB AUTO_INCREMENT=834897735 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `recorder_runs`
--

DROP TABLE IF EXISTS `recorder_runs`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `recorder_runs` (
  `run_id` int(11) NOT NULL AUTO_INCREMENT,
  `start` datetime DEFAULT NULL,
  `end` datetime DEFAULT NULL,
  `closed_incorrect` tinyint(1) DEFAULT NULL,
  `created` datetime DEFAULT NULL,
  PRIMARY KEY (`run_id`),
  KEY `ix_recorder_runs_start_end` (`start`,`end`),
  CONSTRAINT `CONSTRAINT_1` CHECK (`closed_incorrect` in (0,1))
) ENGINE=InnoDB AUTO_INCREMENT=310 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `schema_changes`
--

DROP TABLE IF EXISTS `schema_changes`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `schema_changes` (
  `change_id` int(11) NOT NULL AUTO_INCREMENT,
  `schema_version` int(11) DEFAULT NULL,
  `changed` datetime DEFAULT NULL,
  PRIMARY KEY (`change_id`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `state_attributes`
--

DROP TABLE IF EXISTS `state_attributes`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `state_attributes` (
  `attributes_id` int(11) NOT NULL AUTO_INCREMENT,
  `hash` bigint(20) DEFAULT NULL,
  `shared_attrs` longtext COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`attributes_id`),
  KEY `ix_state_attributes_hash` (`hash`)
) ENGINE=InnoDB AUTO_INCREMENT=246153 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `states`
--

DROP TABLE IF EXISTS `states`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `states` (
  `state_id` int(11) NOT NULL AUTO_INCREMENT,
  `domain` varchar(64) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `entity_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `state` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `attributes` longtext COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `event_id` int(11) DEFAULT NULL,
  `last_changed` datetime(6) DEFAULT NULL,
  `last_updated` datetime(6) DEFAULT NULL,
  `created` datetime(6) DEFAULT NULL,
  `context_id` varchar(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `context_user_id` varchar(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `old_state_id` int(11) DEFAULT NULL,
  `attributes_id` int(20) DEFAULT NULL,
  PRIMARY KEY (`state_id`),
  KEY `ix_states_last_updated` (`last_updated`),
  KEY `ix_states_event_id` (`event_id`),
  KEY `ix_states_entity_id_last_updated` (`entity_id`,`last_updated`),
  KEY `ix_states_old_state_id` (`old_state_id`),
  KEY `ix_states_attributes_id` (`attributes_id`),
  CONSTRAINT `states_ibfk_1` FOREIGN KEY (`event_id`) REFERENCES `events` (`event_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=831902611 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `statistics`
--

DROP TABLE IF EXISTS `statistics`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `statistics` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created` datetime(6) DEFAULT NULL,
  `metadata_id` int(11) DEFAULT NULL,
  `start` datetime(6) DEFAULT NULL,
  `mean` double DEFAULT NULL,
  `min` double DEFAULT NULL,
  `max` double DEFAULT NULL,
  `last_reset` datetime(6) DEFAULT NULL,
  `state` double DEFAULT NULL,
  `sum` double DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ix_statistics_statistic_id_start` (`metadata_id`,`start`),
  KEY `ix_statistics_start` (`start`),
  KEY `ix_statistics_metadata_id` (`metadata_id`),
  CONSTRAINT `statistics_ibfk_1` FOREIGN KEY (`metadata_id`) REFERENCES `statistics_meta` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1297433 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `statistics_meta`
--

DROP TABLE IF EXISTS `statistics_meta`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `statistics_meta` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `statistic_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `source` varchar(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `unit_of_measurement` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `has_mean` tinyint(1) DEFAULT NULL,
  `has_sum` tinyint(1) DEFAULT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_statistics_meta_statistic_id` (`statistic_id`)
) ENGINE=InnoDB AUTO_INCREMENT=306 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `statistics_runs`
--

DROP TABLE IF EXISTS `statistics_runs`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `statistics_runs` (
  `run_id` int(11) NOT NULL AUTO_INCREMENT,
  `start` datetime DEFAULT NULL,
  PRIMARY KEY (`run_id`)
) ENGINE=InnoDB AUTO_INCREMENT=58358 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Table structure for table `statistics_short_term`
--

DROP TABLE IF EXISTS `statistics_short_term`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `statistics_short_term` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created` datetime(6) DEFAULT NULL,
  `start` datetime(6) DEFAULT NULL,
  `mean` double DEFAULT NULL,
  `min` double DEFAULT NULL,
  `max` double DEFAULT NULL,
  `last_reset` datetime(6) DEFAULT NULL,
  `state` double DEFAULT NULL,
  `sum` double DEFAULT NULL,
  `metadata_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ix_statistics_short_term_statistic_id_start` (`metadata_id`,`start`),
  KEY `ix_statistics_short_term_start` (`start`),
  KEY `ix_statistics_short_term_metadata_id` (`metadata_id`),
  CONSTRAINT `statistics_short_term_ibfk_1` FOREIGN KEY (`metadata_id`) REFERENCES `statistics_meta` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=13866140 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
/*!40101 SET character_set_client = @saved_cs_client */;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;

/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;

-- Dump completed on 2022-04-25  9:47:51
bdraco commented 2 years ago

Looks like the index is there.

That query is very fast on sqlite

sqlite> explain query plan SELECT DISTINCT states.attributes_id FROM states WHERE states.attributes_id IN (1,2,3,4,5);
0|0|0|SEARCH TABLE states USING COVERING INDEX ix_states_attributes_id (attributes_id=?)
0|0|0|EXECUTE LIST SUBQUERY 1
sqlite> SELECT DISTINCT states.attributes_id FROM states WHERE states.attributes_id IN (1,2,3,4,5);
1
2
3
4
5
Run Time: real 0.014 user 0.014000 sys 0.001000
sqlite> 

I wonder if MariaDB doesn't optimize it very well

bdraco commented 2 years ago

Can you run an explain on the query? It should look something like this:

MariaDB [homeassistant]> explain  SELECT DISTINCT states.attributes_id FROM states WHERE states.attributes_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,39,450000000,9999999,1000000,65,70,71,72,73,74,274,275,276,277,278,279,280,281,890,2606,338,339,340);
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+------+--------------------------+
| id   | select_type | table  | type  | possible_keys           | key                     | key_len | ref  | rows | Extra                    |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | states | range | ix_states_attributes_id | ix_states_attributes_id | 5       | NULL | 3901 | Using where; Using index |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+------+--------------------------+
1 row in set (0.001 sec)

I tried a few combinations but all the results were very fast for me so I'm wondering what the difference is on your system

MariaDB [homeassistant]>  SELECT DISTINCT states.attributes_id FROM states WHERE states.attributes_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,39,450000000,9999999,1000000,65,70,71,72,73,74,274,275,276,277,278,279,280,281,890,2606,338,339,340);
+---------------+
| attributes_id |
+---------------+
|            65 |
|           274 |
|           338 |
+---------------+
3 rows in set (0.003 sec)
igogold commented 2 years ago

Here is an output of explain:

MariaDB [homeassistant]> explain  SELECT DISTINCT states.attributes_id FROM states WHERE states.attributes_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,39,450000000,9999999,1000000,65,70,71,72,73,74,274,275,276,277,278,279,280,281,890,2606,338,339,340);
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+-------+--------------------------+
| id   | select_type | table  | type  | possible_keys           | key                     | key_len | ref  | rows  | Extra                    |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+-------+--------------------------+
|    1 | SIMPLE      | states | range | ix_states_attributes_id | ix_states_attributes_id | 5       | NULL | 28317 | Using where; Using index |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+-------+--------------------------+
1 row in set (0.026 sec)

Also from SELECT

MariaDB [homeassistant]> SELECT DISTINCT states.attributes_id FROM states WHERE states.attributes_id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,39,450000000,9999999,1000000,65,70,71,72,73,74,274,275,276,277,278,279,280,281,890,2606,338,339,340);
+---------------+
| attributes_id |
+---------------+
|             1 |
|             2 |
|             3 |
|             4 |
|             5 |
|             6 |
|             7 |
|             8 |
|             9 |
|            17 |
|            20 |
|            21 |
|            22 |
|            23 |
|            24 |
|            25 |
|            26 |
|            27 |
|            28 |
|            65 |
|           274 |
|           275 |
|           276 |
|           278 |
|           279 |
|           280 |
|           338 |
|           339 |
|           340 |
|           890 |
+---------------+
30 rows in set (0.064 sec)

These requests are quite fast.

igogold commented 2 years ago

Also output of my "show processlist" now:

MariaDB [homeassistant]> show processlist;
+-----+---------------+-------------------+---------------+---------+------+--------------+------------------------------------------------------------------------------------------------------+----------+
| Id  | User          | Host              | db            | Command | Time | State        | Info                                                                                                 | Progress |
+-----+---------------+-------------------+---------------+---------+------+--------------+------------------------------------------------------------------------------------------------------+----------+
|  95 | homeassistant | 172.30.32.1:41590 | homeassistant | Sleep   |   66 |              | NULL                                                                                                 |    0.000 |
|  96 | homeassistant | 172.30.32.1:42354 | homeassistant | Sleep   |   66 |              | NULL                                                                                                 |    0.000 |
|  97 | homeassistant | 172.30.32.1:42356 | homeassistant | Sleep   |   66 |              | NULL                                                                                                 |    0.000 |
|  98 | homeassistant | 172.30.32.1:42358 | homeassistant | Query   |   66 | Sending data | SELECT DISTINCT states.attributes_id 
FROM states 
WHERE states.attributes_id IN (143, 123, 39962, 3 |    0.000 |
| 104 | homeassistant | 172.30.32.1:38286 | homeassistant | Query   |    0 | Init         | show processlist                                                                                     |    0.000 |
+-----+---------------+-------------------+---------------+---------+------+--------------+------------------------------------------------------------------------------------------------------+----------+
5 rows in set (0.001 sec)
bdraco commented 2 years ago

Please run show full processlist; and then explain on the query of the one thats been sending data for a long time (143, 123, 39962, 3 ....

igogold commented 2 years ago

Thank you for show full processlist;, I don't know mysql well, mostly postgresql. Here is an output of explain:

MariaDB [homeassistant]> explain SELECT DISTINCT states.attributes_id 
    -> FROM states 
    -> WHERE states.attributes_id IN (2586, 127, 4138, 39980, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 40014, 591, 592, 40017, 593, 594, 595, 596, 40018, 590, 598, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 121, 634, 636, 637, 638, 635, 124, 125, 126, 643, 644, 645, 646, 647, 136, 137, 138, 139, 648, 649, 654, 655, 656, 657, 658, 659, 651, 652, 141, 142, 143, 144, 664, 665, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 1222, 711, 650, 713, 714, 715, 639, 716, 717, 718, 719, 720, 721, 722, 723, 724, 653, 727, 728, 729, 730, 725, 732, 731, 734, 735, 736, 737, 228, 741, 131, 745, 746, 640, 749, 660, 661, 132, 1278, 641, 662, 128, 663, 666, 133, 667, 642, 813, 814, 815, 134, 332, 333, 335, 351, 352, 362, 363, 374, 375, 381, 382, 383, 384, 129, 390, 391, 40015, 2630, 597, 599, 600, 996, 997, 999, 199, 1004, 200, 503, 504, 202, 39992);
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+-------+--------------------------+
| id   | select_type | table  | type  | possible_keys           | key                     | key_len | ref  | rows  | Extra                    |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+-------+--------------------------+
|    1 | SIMPLE      | states | range | ix_states_attributes_id | ix_states_attributes_id | 5       | NULL | 58674 | Using where; Using index |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+-------+--------------------------+
1 row in set (0.002 sec)

And from SELECT:

MariaDB [homeassistant]> SELECT DISTINCT states.attributes_id 
    -> FROM states 
    -> WHERE states.attributes_id IN (2586, 127, 4138, 39980, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 40014, 591, 592, 40017, 593, 594, 595, 596, 40018, 590, 598, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 121, 634, 636, 637, 638, 635, 124, 125, 126, 643, 644, 645, 646, 647, 136, 137, 138, 139, 648, 649, 654, 655, 656, 657, 658, 659, 651, 652, 141, 142, 143, 144, 664, 665, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 1222, 711, 650, 713, 714, 715, 639, 716, 717, 718, 719, 720, 721, 722, 723, 724, 653, 727, 728, 729, 730, 725, 732, 731, 734, 735, 736, 737, 228, 741, 131, 745, 746, 640, 749, 660, 661, 132, 1278, 641, 662, 128, 663, 666, 133, 667, 642, 813, 814, 815, 134, 332, 333, 335, 351, 352, 362, 363, 374, 375, 381, 382, 383, 384, 129, 390, 391, 40015, 2630, 597, 599, 600, 996, 997, 999, 199, 1004, 200, 503, 504, 202, 39992);
+---------------+
| attributes_id |
+---------------+
|           121 |
|           124 |
|           125 |
|           126 |
|           127 |
|           128 |
|           129 |
|           131 |
|           132 |
|           133 |
|           134 |
|           136 |
|           137 |
|           138 |
|           139 |
|           141 |
|           142 |
|           143 |
|           144 |
|           199 |
|           200 |
|           202 |
|           228 |
|           332 |
|           333 |
|           335 |
|           351 |
|           352 |
|           362 |
|           363 |
|           374 |
|           375 |
|           381 |
|           382 |
|           383 |
|           384 |
|           390 |
|           391 |
|           503 |
|           504 |
|           563 |
|           564 |
|           565 |
|           566 |
|           567 |
|           568 |
|           569 |
|           570 |
|           571 |
|           572 |
|           573 |
|           574 |
|           575 |
|           576 |
|           577 |
|           578 |
|           579 |
|           580 |
|           581 |
|           582 |
|           583 |
|           584 |
|           585 |
|           586 |
|           587 |
|           588 |
|           589 |
|           590 |
|           591 |
|           592 |
|           593 |
|           594 |
|           595 |
|           596 |
|           597 |
|           598 |
|           599 |
|           600 |
|           601 |
|           602 |
|           603 |
|           604 |
|           605 |
|           606 |
|           607 |
|           608 |
|           609 |
|           610 |
|           611 |
|           612 |
|           613 |
|           614 |
|           615 |
|           616 |
|           617 |
|           618 |
|           619 |
|           620 |
|           621 |
|           622 |
|           623 |
|           624 |
|           625 |
|           626 |
|           627 |
|           628 |
|           629 |
|           630 |
|           631 |
|           632 |
|           633 |
|           634 |
|           635 |
|           636 |
|           637 |
|           638 |
|           639 |
|           640 |
|           641 |
|           642 |
|           643 |
|           644 |
|           645 |
|           646 |
|           647 |
|           648 |
|           649 |
|           650 |
|           651 |
|           652 |
|           653 |
|           654 |
|           655 |
|           656 |
|           657 |
|           658 |
|           659 |
|           660 |
|           661 |
|           662 |
|           663 |
|           664 |
|           665 |
|           666 |
|           667 |
|           668 |
|           669 |
|           670 |
|           671 |
|           672 |
|           673 |
|           674 |
|           675 |
|           676 |
|           677 |
|           678 |
|           679 |
|           680 |
|           681 |
|           682 |
|           683 |
|           684 |
|           685 |
|           686 |
|           687 |
|           688 |
|           689 |
|           690 |
|           691 |
|           692 |
|           693 |
|           694 |
|           695 |
|           696 |
|           697 |
|           698 |
|           699 |
|           700 |
|           701 |
|           702 |
|           703 |
|           704 |
|           705 |
|           706 |
|           707 |
|           711 |
|           713 |
|           714 |
|           715 |
|           716 |
|           717 |
|           718 |
|           719 |
|           720 |
|           721 |
|           722 |
|           723 |
|           724 |
|           725 |
|           727 |
|           728 |
|           729 |
|           730 |
|           731 |
|           732 |
|           734 |
|           735 |
|           736 |
|           737 |
|           741 |
|           745 |
|           746 |
|           749 |
|           813 |
|           814 |
|           815 |
|           996 |
|           997 |
|           999 |
|          1004 |
|          1222 |
|          1278 |
|          2586 |
|          2630 |
|          4138 |
|         39980 |
|         39992 |
|         40014 |
+---------------+
228 rows in set (1 min 16.996 sec)
bdraco commented 2 years ago

That's definitely a problem. Can you run an explain on it?

EXPLAIN SELECT DISTINCT states.attributes_id FROM states WHERE states.attributes_id IN (2586, 127, 4138, 39980, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 40014, 591, 592, 40017, 593, 594, 595, 596, 40018, 590, 598, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 121, 634, 636, 637, 638, 635, 124, 125, 126, 643, 644, 645, 646, 647, 136, 137, 138, 139, 648, 649, 654, 655, 656, 657, 658, 659, 651, 652, 141, 142, 143, 144, 664, 665, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 1222, 711, 650, 713, 714, 715, 639, 716, 717, 718, 719, 720, 721, 722, 723, 724, 653, 727, 728, 729, 730, 725, 732, 731, 734, 735, 736, 737, 228, 741, 131, 745, 746, 640, 749, 660, 661, 132, 1278, 641, 662, 128, 663, 666, 133, 667, 642, 813, 814, 815, 134, 332, 333, 335, 351, 352, 362, 363, 374, 375, 381, 382, 383, 384, 129, 390, 391, 40015, 2630, 597, 599, 600, 996, 997, 999, 199, 1004, 200, 503, 504, 202, 39992);

bdraco commented 2 years ago

Never-mind you posted the explain. I just didn't see it on my phone

bdraco commented 2 years ago

If we re-write the query as SELECT states.attributes_id FROM states WHERE states.attributes_id IN (2586, 127, 4138, 39980, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 40014, 591, 592, 40017, 593, 594, 595, 596, 40018, 590, 598, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 121, 634, 636, 637, 638, 635, 124, 125, 126, 643, 644, 645, 646, 647, 136, 137, 138, 139, 648, 649, 654, 655, 656, 657, 658, 659, 651, 652, 141, 142, 143, 144, 664, 665, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 1222, 711, 650, 713, 714, 715, 639, 716, 717, 718, 719, 720, 721, 722, 723, 724, 653, 727, 728, 729, 730, 725, 732, 731, 734, 735, 736, 737, 228, 741, 131, 745, 746, 640, 749, 660, 661, 132, 1278, 641, 662, 128, 663, 666, 133, 667, 642, 813, 814, 815, 134, 332, 333, 335, 351, 352, 362, 363, 374, 375, 381, 382, 383, 384, 129, 390, 391, 40015, 2630, 597, 599, 600, 996, 997, 999, 199, 1004, 200, 503, 504, 202, 39992) group by states.attributes_id;

Is it any faster?

igogold commented 2 years ago

No, not faster.

226 rows in set (1 min 19.481 sec)
bdraco commented 2 years ago
MariaDB [homeassistant]> SELECT states.attributes_id FROM states WHERE states.attributes_id IN (2586, 127, 4138, 39980, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 40014, 591, 592, 40017, 593, 594, 595, 596, 40018, 590, 598, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 121, 634, 636, 637, 638, 635, 124, 125, 126, 643, 644, 645, 646, 647, 136, 137, 138, 139, 648, 649, 654, 655, 656, 657, 658, 659, 651, 652, 141, 142, 143, 144, 664, 665, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 1222, 711, 650, 713, 714, 715, 639, 716, 717, 718, 719, 720, 721, 722, 723, 724, 653, 727, 728, 729, 730, 725, 732, 731, 734, 735, 736, 737, 228, 741, 131, 745, 746, 640, 749, 660, 661, 132, 1278, 641, 662, 128, 663, 666, 133, 667, 642, 813, 814, 815, 134, 332, 333, 335, 351, 352, 362, 363, 374, 375, 381, 382, 383, 384, 129, 390, 391, 40015, 2630, 597, 599, 600, 996, 997, 999, 199, 1004, 200, 503, 504, 202, 39992) group by states.attributes_id;
+---------------+
| attributes_id |
+---------------+
|           332 |
|           335 |
+---------------+
2 rows in set (0.003 sec)

MariaDB [homeassistant]> explain SELECT states.attributes_id FROM states WHERE states.attributes_id IN (2586, 127, 4138, 39980, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 40014, 591, 592, 40017, 593, 594, 595, 596, 40018, 590, 598, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 121, 634, 636, 637, 638, 635, 124, 125, 126, 643, 644, 645, 646, 647, 136, 137, 138, 139, 648, 649, 654, 655, 656, 657, 658, 659, 651, 652, 141, 142, 143, 144, 664, 665, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 1222, 711, 650, 713, 714, 715, 639, 716, 717, 718, 719, 720, 721, 722, 723, 724, 653, 727, 728, 729, 730, 725, 732, 731, 734, 735, 736, 737, 228, 741, 131, 745, 746, 640, 749, 660, 661, 132, 1278, 641, 662, 128, 663, 666, 133, 667, 642, 813, 814, 815, 134, 332, 333, 335, 351, 352, 362, 363, 374, 375, 381, 382, 383, 384, 129, 390, 391, 40015, 2630, 597, 599, 600, 996, 997, 999, 199, 1004, 200, 503, 504, 202, 39992) group by states.attributes_id;
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+------+--------------------------+
| id   | select_type | table  | type  | possible_keys           | key                     | key_len | ref  | rows | Extra                    |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | states | range | ix_states_attributes_id | ix_states_attributes_id | 5       | NULL |  386 | Using where; Using index |
+------+-------------+--------+-------+-------------------------+-------------------------+---------+------+------+--------------------------+
1 row in set (0.003 sec)

Yours is examining 58674 rows but even then it shouldn't take minutes.

How large is your database?

igogold commented 2 years ago

Text SQL dump with data about 10GB. And found command in the Internet:

MariaDB [homeassistant]> SELECT table_schema "DB Name",
    ->         ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB" 
    -> FROM information_schema.tables 
    -> GROUP BY table_schema; 
+--------------------+---------------+
| DB Name            | DB Size in MB |
+--------------------+---------------+
| homeassistant      |       18813.8 |
| information_schema |           0.2 |
+--------------------+---------------+
bdraco commented 2 years ago

How many states are in the table?

select count(*) from states;

igogold commented 2 years ago

Yes, I already run this command before your post :)

MariaDB [homeassistant]> select count(*) from states;
+----------+
| count(*) |
+----------+
| 36285570 |
+----------+
1 row in set (54.448 sec)
bdraco commented 2 years ago

Is this query any faster? select state_attributes.attributes_id from state_attributes inner join states on (state_attributes.attributes_id=states.attributes_id) where state_attributes.attributes_id IN (2586, 127, 4138, 39980, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 40014, 591, 592, 40017, 593, 594, 595, 596, 40018, 590, 598, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 121, 634, 636, 637, 638, 635, 124, 125, 126, 643, 644, 645, 646, 647, 136, 137, 138, 139, 648, 649, 654, 655, 656, 657, 658, 659, 651, 652, 141, 142, 143, 144, 664, 665, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 1222, 711, 650, 713, 714, 715, 639, 716, 717, 718, 719, 720, 721, 722, 723, 724, 653, 727, 728, 729, 730, 725, 732, 731, 734, 735, 736, 737, 228, 741, 131, 745, 746, 640, 749, 660, 661, 132, 1278, 641, 662, 128, 663, 666, 133, 667, 642, 813, 814, 815, 134, 332, 333, 335, 351, 352, 362, 363, 374, 375, 381, 382, 383, 384, 129, 390, 391, 40015, 2630, 597, 599, 600, 996, 997, 999, 199, 1004, 200, 503, 504, 202, 39992) group by state_attributes.attributes_id;

igogold commented 2 years ago

It slightly faster

226 rows in set (1 min 0.006 sec)

not 1 min 19 sec as before.

bdraco commented 2 years ago

It seems like every query is unexpectedly slow.

Is there by chance another process besides MariaDB that is using a lot of cpu time?

Can you successfully make a dump of the database using mysqldump without an error?

igogold commented 2 years ago

Some CPU load presents all time but as you can see from https://github.com/home-assistant/core/issues/70309#issue-1209175010 total average load increased from about 50% to 70-80% and when this "SELECT DISTINCT" command executed CPU load for mysqld process is about 50%.

I run every night at 2:30 mariadb-dump to backup homeassistant database and it finished well, last time with

-- Dump completed on 2022-04-25  2:52:32

The CPU is "Intel(R) Celeron(R) CPU N2820 @ 2.13GHz" Disk is 240GB Samsung 2,5" SATA SSD (don't remember exact model).

bdraco commented 2 years ago

Is the cpu increase any less with the other query? https://github.com/home-assistant/core/issues/70309#issuecomment-1108129538

Honestly I'm not sure how much more we can optimize it short of actually testing queries on your database to find one that is more efficient.

igogold commented 2 years ago

No, both queries create cpu load between 35-50% while running. In the idle mysqld loads CPU at 3-5% only.

avbor commented 2 years ago

Apparently I have a similar problem - https://github.com/home-assistant/core/issues/70677

bdraco commented 2 years ago

I setup a test database with 2 million at states and wasn't able to replicate the slowness.

As I'm not likely going to be able to replicate this issue with random data, I'm going to need a database backup file from someone with the issue so I can restore it in my test environment and see if I can replicate the slowness, then figure out how to optimize for it.

If you are willing to share -> nick@koston.org

igogold commented 2 years ago

@bdraco I'm compressing (xz) mysql dump (10GB), after ~2h I'll upload it to cloud and send you a mail with a link.

bdraco commented 2 years ago

Thanks @igogold

Quick review of your sql looks like it be fixed in tomorrow's 2022.5 beta.

I'll do some more testing later tonight if I have time

bdraco commented 2 years ago

Also you win the award for the most states in a database I've seen yet.

MariaDB [homeassistant]> select count(*) from states;
+----------+
| count(*) |
+----------+
| 38133201 |
+----------+
1 row in set (7.008 sec)

MariaDB [homeassistant]> select count(*) from state_attributes;
+----------+
| count(*) |
+----------+
|   220066 |
+----------+

I'll see if I can come up with some suggestions for getting the data a bit more manageable.

bdraco commented 2 years ago

Based on the above it looks like we can avoid storing 99.4% of full attributes in the database. But there are the some that frequently generate a lot of unique attributes that can't be deduplicated

Do you know which integrations provides these entities as they generate a lot of attribute changes?

sensor.map_sine_pro sensor.map_dominator_ups air_quality.sensor_vozdukha_1 air_quality.sensor_vozdukha_2 binary_sensor.54ef442d537c_gateway sensor.sredniaia_temperatura_abk_tsokol climate.obogrev_tualet_levyi sensor.sredniaia_temperatura_zh_d_vagon_pom_g

The following generate a lot of state changes since they don't round values sensor.energomera_modul_podval_moshchnost_faza_b sensor.energomera_modul_podval_tok_faza_b sensor.energomera_abk_energiia_reaktivnaia_sgenerirovannaia ... more of the same ...

bdraco commented 2 years ago

Also I forgot to mention, the purging is dramatically faster now

igogold commented 2 years ago

Thank you for your observation!

Most entity data also stored in influxdb for longtime statistic and ability to create graphs in grafana, so attributes with helpful values are needed.

Do you know which integrations provides these entities as they generate a lot of attribute changes?

sensor.map_sine_pro sensor.map_dominator_ups

- platform: command_line Handmade script to get info from UPS. Mostly for influxdb storage. I can create template sensors for current states and exclude these two entities from history.

air_quality.sensor_vozdukha_1 air_quality.sensor_vozdukha_2

Xiaomi miio Air quality monitor - old style multi attributes entities. Also needed for influxdb, can be excluded from history database.

binary_sensor.54ef442d537c_gateway

Custom integration Xiaomi Gateway 3. It can be excluded from history at all, binary sensor state not interesting, attributes are some technical info.

sensor.sredniaia_temperatura_abk_tsokol sensor.sredniaia_temperatura_zh_d_vagon_pom_g

Min/max integration. Attributes are side effect. :)

climate.obogrev_tualet_levyi

Custom integration Smart Thermostat (PID). Attributes are also mostly technical.

The following generate a lot of state changes since they don't round values sensor.energomera_modul_podval_moshchnost_faza_b sensor.energomera_modul_podval_tok_faza_b sensor.energomera_abk_energiia_reaktivnaia_sgenerirovannaia ... more of the same ...

These are values from energy meters got via MQTT.

In common history information of entities are helpful for quick analyze via built-in HA cards/history graphics.

bdraco commented 2 years ago

sensor.map_sine_pro sensor.map_dominator_ups

- platform: command_line Handmade script to get info from UPS. Mostly for influxdb storage. I can create template sensors for current states and exclude these two entities from history.

👍

air_quality.sensor_vozdukha_1 air_quality.sensor_vozdukha_2

Xiaomi miio Air quality monitor - old style multi attributes entities. Also needed for influxdb, can be excluded from history database.

That one is a bit legacy. We don't even have a recorder platform for reducing what is stored in the db for the base entity. I'll see if we can clean that up a bit.

binary_sensor.54ef442d537c_gateway

Custom integration Xiaomi Gateway 3. It can be excluded from history at all, binary sensor state not interesting, attributes are some technical info.

Might be nice to open an issue for this one in the chance they can be optimized at some point

sensor.sredniaia_temperatura_abk_tsokol sensor.sredniaia_temperatura_zh_d_vagon_pom_g

Min/max integration. Attributes are side effect. :)

The min/max attributes will be solved in 2022.5 via #70142

climate.obogrev_tualet_levyi

Custom integration Smart Thermostat (PID). Attributes are also mostly technical.

https://github.com/ScratMan/HASmartThermostat/blob/master/custom_components/smart_thermostat/climate.py#L638

Wow, that one is going to fill up anyone's database quick! Most of these should probably be sensor entities instead

See https://developers.home-assistant.io/docs/core/entity/?_highlight=extra_state_attributes

WARNING Entities that generate a significant amount of state changes can quickly increase the size of the database when the extra_state_attributes also change frequently. Minimize the number of extra_state_attributes for these entities by removing non-critical attributes or creating additional sensor entities.

The following generate a lot of state changes since they don't round values sensor.energomera_modul_podval_moshchnost_faza_b sensor.energomera_modul_podval_tok_faza_b sensor.energomera_abk_energiia_reaktivnaia_sgenerirovannaia ... more of the same ... These are values from energy meters got via MQTT.

Not sure if it is possible to get the source to round the values before injecting them as I don't think MQTT will do that.

igogold commented 2 years ago

Thank you @bdraco for your suggestions! I'll try to open issues in these custom integrations for attribute optimization according to current HA guideline.

igogold commented 2 years ago

@bdraco, some new information about my big history database.

After yesterday upgrade HA to 2022.5.0 and tonight start of database purging I have slowly decreasing numbers of 'states' table records. I run

SELECT min(last_updated) FROM states;

and got 2022-04-19 08:51:24.280995, so my database was not cleaned many days because of previous slow purging algorithm. It's set to keep only last 7 days, so it must be 2022-04-29, not 19th. Also, my total count of records in 'states' table about 48M (when I sent you DB dump it was 38M). And it's after 5 hours of purging!

I also checked output of 'show processlist;' and there are no long running SELECT anymore, only 1-2 sec queries. So I'm waiting finish of purging process but it can takes several days.

I'll inform you when it finished and how many time the purging will take after that.

bdraco commented 2 years ago

Thanks for the update. I did some work for 2022.6 already that should reduce the size of the data stored by roughly 30% and make it a bit faster to purge.

I've started planning additional improvements for 2022.7. Once you have been running 2022.6 for a few days it would be great to another database dump to see if there is anything I can do to further improve the situation.

igogold commented 2 years ago

As promised there is an update about performance of database purging.

First purging after upgrade took 16h 25min, after that every next purge took about 1 hour, CPU load is the same as before no more no less. No problem with a memory leak.

Current database counters for 7 days history:

Great work, @bdraco, thank you!

Hope I can help in optimization for 2022.7 release.

bdraco commented 2 years ago

2022.6 beta is available with the changes