nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.35k stars 4.06k forks source link

Some queries have a possible_index but are not using it with too huge IN(…) #30042

Open nickvergessen opened 2 years ago

nickvergessen commented 2 years ago

Getting user status for many users (e.g. the talk room list is loading the status of all one-to-one conversations)

 EXPLAIN SELECT * FROM `oc_user_status` WHERE `user_id` IN ('…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…');
+------+-------------+----------------+------+--------------------+------+---------+------+------+-------------+
| id   | select_type | table          | type | possible_keys      | key  | key_len | ref  | rows | Extra       |
+------+-------------+----------------+------+--------------------+------+---------+------+------+-------------+
|    1 | SIMPLE      | oc_user_status | ALL  | user_status_uid_ix | NULL | NULL    | NULL |  227 | Using where |
+------+-------------+----------------+------+--------------------+------+---------+------+------+-------------+

If the count is below 21 the index is used:

EXPLAIN SELECT * FROM `oc_user_status` WHERE `user_id` IN ('…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…', '…');
+------+-------------+----------------+-------+--------------------+--------------------+---------+------+------+-----------------------+
| id   | select_type | table          | type  | possible_keys      | key                | key_len | ref  | rows | Extra                 |
+------+-------------+----------------+-------+--------------------+--------------------+---------+------+------+-----------------------+
|    1 | SIMPLE      | oc_user_status | range | user_status_uid_ix | user_status_uid_ix | 1022    | NULL |   19 | Using index condition |
+------+-------------+----------------+-------+--------------------+--------------------+---------+------+------+-----------------------+

Similar issue with getting calendars

EXPLAIN SELECT `displayname`, `description`, `timezone`, `calendarorder`, `calendarcolor`, `deleted_at`, `a`.`id`, `a`.`uri`, `a`.`synctoken`, `a`.`components`, `a`.`principaluri`, `a`.`transparent`, `s`.`access` FROM `oc_dav_shares` `s` INNER JOIN `oc_calendars` `a` ON `s`.`resourceid` = `a`.`id` WHERE (`s`.`principaluri` IN ('…', '…', '…', '…', '…', '…', '…', '…', '…')) AND (`s`.`type` = 'calendar');
+------+-------------+-------+--------+------------------+---------+---------+-----------------+------+-------------+
| id   | select_type | table | type   | possible_keys    | key     | key_len | ref             | rows | Extra       |
+------+-------------+-------+--------+------------------+---------+---------+-----------------+------+-------------+
|    1 | SIMPLE      | s     | ALL    | dav_shares_index | NULL    | NULL    | NULL            |   96 | Using where |
|    1 | SIMPLE      | a     | eq_ref | PRIMARY          | PRIMARY | 8       | oc.s.resourceid |    1 |             |
+------+-------------+-------+--------+------------------+---------+---------+-----------------+------+-------------+
2 rows in set (0.004 sec)

again fixed when using a smaller IN:

EXPLAIN SELECT `displayname`, `description`, `timezone`, `calendarorder`, `calendarcolor`, `deleted_at`, `a`.`id`, `a`.`uri`, `a`.`synctoken`, `a`.`components`, `a`.`principaluri`, `a`.`transparent`, `s`.`access` FROM `oc_dav_shares` `s` INNER JOIN `oc_calendars` `a` ON `s`.`resourceid` = `a`.`id` WHERE (`s`.`principaluri` IN ('…', '…', '…')) AND (`s`.`type` = 'calendar');
+------+-------------+-------+--------+------------------+------------------+---------+-----------------+------+-----------------------+
| id   | select_type | table | type   | possible_keys    | key              | key_len | ref             | rows | Extra                 |
+------+-------------+-------+--------+------------------+------------------+---------+-----------------+------+-----------------------+
|    1 | SIMPLE      | s     | range  | dav_shares_index | dav_shares_index | 1023    | NULL            |    8 | Using index condition |
|    1 | SIMPLE      | a     | eq_ref | PRIMARY          | PRIMARY          | 8       | oc.s.resourceid |    1 |                       |
+------+-------------+-------+--------+------------------+------------------+---------+-----------------+------+-----------------------+
2 rows in set (0.005 sec)
nickvergessen commented 2 years ago

cc @ChristophWurst @CarlSchwan @skjnldsv

kinimodmeyer commented 2 years ago

how many rows does the table oc_user_status have and are you really see performance problems? i believe mysql do full table scan because it is more efficent in that case and that would not be a bad thing at all.

https://stackoverflow.com/a/586834

nickvergessen commented 2 years ago

how many rows does the table oc_user_status have and are you really see performance problems?

As per Explain there are around 227 entries. so it seems it's around the 10% mark where they stop using it. We are not really observing slowness directly, we just enabled the slow query log with "queries that don't use an index" and those are getting reported.

icewind1991 commented 2 years ago

At 10% it seems reasonable that doing a full table scan will be faster than walking the index multiple times.