nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.64k stars 3.99k forks source link

Feature/fix `files:scan-app-data` database optimization #29139

Open piotr-sikora-v opened 2 years ago

piotr-sikora-v commented 2 years ago

I found something that really speed up files:scan-app-data cron

function getUsersToScan in ./lib/BackgroundJob/ScanFiles.php generarate this SQL:

SELECT MAX(`user_id`) FROM `oc_filecache` `f` INNER JOIN `oc_mounts` `m`  ON `storage_id` = `storage` WHERE `size` < 0 GROUP BY `storage_id`;

but it use fs_size index on oc_filecache, and I think is wrong

Better will work to use fs_storege_size index

For my instance it took about 10 minutes on fs_size index, but on fs_storage_size it takes 2-3 seconds!

So my suggestion is change function to:

private function getUsersToScan(): array {
                $query = $this->connection->getQueryBuilder();
                $query->select($query->func()->max('user_id'))
                        ->from('filecache FORCE INDEX(`fs_storage_size`)', 'f')
                        ->innerJoin('f', 'mounts', 'm', $query->expr()->eq('storage_id', 'storage'))
                        ->where($query->expr()->lt('size', $query->createNamedParameter(0, IQueryBuilder::PARAM_INT)))
                        ->groupBy('storage_id')
                        ->setMaxResults(self::USERS_PER_SESSION);

                return $query->execute()->fetchAll(\PDO::FETCH_COLUMN);
        }
DanScharon commented 2 years ago

you may want to open a pull request for this

piotr-sikora-v commented 2 years ago

I'am not good in Symfony so I don't think is good code ;) Maybe better is this:

private function getUsersToScan(): array {

                $sql = 'SELECT MAX(`user_id`) FROM `oc_filecache` `f` FORCE INDEX(`fs_storage_size`) 
                INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE `size` < 0  GROUP BY `storage_id` 
                LIMIT'.self::USERS_PER_SESSION;
                $em = $this->connection()->prepare($sql);
                $stmt->execute();
                return $stmt->fetchAll();
        }
ruedigerkupper commented 2 years ago

Did you file a pull request for this?

fulax commented 1 year ago

Hello, I can confirm the force index helps a lot, and I am also unable to translate that into Symfony's query builder.

This is on nextcloud 25.0.6 with MariaDB 10.3.38.

Here are the raw explain with timings :

MariaDB [cloud]> ANALYZE SELECT `user_id` FROM `oc_filecache` `f` INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1;
+------+-------------+-------+-------+-------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------+--------------------+------+-----------+----------+------------+-------------+
| id   | select_type | table | type  | possible_keys                                                                                                                             | key                    | key_len | ref                | rows | r_rows    | filtered | r_filtered | Extra       |
+------+-------------+-------+-------+-------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------+--------------------+------+-----------+----------+------------+-------------+
|    1 | SIMPLE      | m     | index | mounts_storage_index,mount_user_storage                                                                                                   | mount_user_storage     | 266     | NULL               | 2248 |   2300.00 |   100.00 |     100.00 | Using index |
|    1 | SIMPLE      | f     | ref   | fs_storage_path_hash,fs_parent_name_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_size,fs_storage_path_prefix,fs_parent | fs_storage_path_prefix | 8       | cloud.m.storage_id |   65 | 162403.30 |   100.00 |       0.00 | Using where |
+------+-------------+-------+-------+-------------------------------------------------------------------------------------------------------------------------------------------+------------------------+---------+--------------------+------+-----------+----------+------------+-------------+
2 rows in set (8 min 0,984 sec)
MariaDB [cloud]> ANALYZE SELECT `user_id` FROM `oc_filecache` `f` FORCE INDEX(`fs_storage_size`) INNER JOIN `oc_mounts` `m` ON `storage_id` = `storage` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1;
+------+-------------+-------+------+-----------------------------------------+--------------------+---------+-----------------+--------+-----------+----------+------------+-------------+
| id   | select_type | table | type | possible_keys                           | key                | key_len | ref             | rows   | r_rows    | filtered | r_filtered | Extra       |
+------+-------------+-------+------+-----------------------------------------+--------------------+---------+-----------------+--------+-----------+----------+------------+-------------+
|    1 | SIMPLE      | f     | ALL  | fs_storage_size                         | NULL               | NULL    | NULL            | 538819 | 604780.00 |   100.00 |      17.15 | Using where |
|    1 | SIMPLE      | m     | ref  | mounts_storage_index,mount_user_storage | mount_user_storage | 8       | cloud.f.storage |      2 |      0.00 |   100.00 |     100.00 | Using index |
+------+-------------+-------+------+-----------------------------------------+--------------------+---------+-----------------+--------+-----------+----------+------------+-------------+
2 rows in set (0,299 sec)
joshtrichards commented 1 year ago

Those on this thread might find the discussion in #24401 of interest as well.