nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.67k stars 4k forks source link

[Bug]: Queries from `occ maintenance:repair-share-owner` stuck for 40+ hours #47184

Closed hamza221 closed 2 weeks ago

hamza221 commented 1 month ago

⚠️ This issue respects the following points: ⚠️

Bug description

Queries from occ maintenance:repair-share-owner in state=executing for 40+ hours for large instances SELECT 's'.'id', 'm'.'user_id', 's'.'uid_owner', 's'.'uid_initiator', 's'.'share_with', 's'.'file_target' FROM 'oc_share' 's' INNER JOIN 'oc_filecache' 'f' ON 's'.'item_source' = CAST('f'.'fileid' AS CHAR) INNER JOIN 'oc_mounts' 'm' ON 'f'.'storage' = 'm'.'storage_id' WHERE ('m'.'user_id' <> 's'.'uid_owner') AND (CONCAT('/', 'm'.'user_id', '/') = 'm'.'mount_point')

'Explain' output

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE m NULL ALL mounts_storage_index,mount_user_storage NULL NULL NULL 63647 10.00 Using where
1 SIMPLE s NULL ALL NULL NULL NULL NULL 48121 90.00 Using where; Using join buffer (hash join)
1 SIMPLE f NULL ref fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix fs_storage_path_hash 8 boxup.m.storage_id 101 100.00 Using where; Using index

Steps to reproduce

• having cronjobs active (every 15 min) • running occ maintenance:repair-share-owner • causing some changes to the file cache every hour: o creating 1000 files directly in the data directory with 1 MB of random binary content o use occ files:scan on that directory o delete the files o run occ files:scan again on that directory • Waiting for 12+ hours ...

Expected behavior

Database doesn't get stuck

Installation method

None

Nextcloud Server version

28

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.1

Web server

Apache (supported)

Database engine version

MySQL

Is this bug present after an update or on a fresh install?

Upgraded to a MAJOR version (ex. 28 to 29)

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Configuration report

No response

List of activated Apps

- activity: 2.20.0
 - bruteforcesettings: 2.8.0
 - cloud_federation_api: 1.11.0
 - contactsinteraction: 1.9.0
 - dav: 1.29.2
 - federatedfilesharing: 1.18.0
 - federation: 1.18.0
 - files: 2.0.0
 - files_confidential: 3.0.2
 - files_external: 1.20.0
 - files_pdfviewer: 2.9.0
 - files_sharing: 1.20.0
 - files_trashbin: 1.18.0
 - files_versions: 1.21.0
 - impersonate: 1.15.0
 - logreader: 2.13.0
 - lookup_server_connector: 1.16.0
 - notifications: 2.16.0
 - oauth2: 1.16.3
 - onlyoffice: 9.3.0
 - password_policy: 1.18.0
 - photos: 2.4.0
 - privacy: 1.12.0
 - provisioning_api: 1.18.0
 - recommendations: 2.0.0
 - related_resources: 1.3.0
 - security_guard: 1.2.1
 - settings: 1.10.1
 - support: 1.11.1
 - text: 3.9.2
 - theming: 2.3.0
 - twofactor_backupcodes: 1.17.0
 - user_ldap: 1.19.0
 - viewer: 2.2.0
 - workflowengine: 2.10.0

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

mysql-processes

ChristophWurst commented 1 month ago

The query does not use indices and has to scan a lot of data, especially in oc_mounts.

It looks like the query comes from \OC\Core\Command\Maintenance\RepairShareOwnership::getWrongShareOwnership. The code was moved in https://github.com/nextcloud/server/commit/3d68a526e7ede5e2db13ffa69177a933a278b806. Before that there was no cast, which seems to prevent some index usage.

SELECT SQL_NO_CACHE `s`.`id`, `m`.`user_id`, `s`.`uid_owner`, `s`.`uid_initiator`, `s`.`share_with`, `s`.`file_target` FROM `oc_share` `s` INNER JOIN `oc_filecache` `f` ON `s`.`item_source` = CAST(`f`.`fileid` AS CHAR) INNER JOIN `oc_mounts` `m` ON `f`.`storage` = `m`.`storage_id` WHERE (`m`.`user_id` <> `s`.`uid_owner`) AND (CONCAT('/', `m`.`user_id`, '/') = `m`.`mount_point`) takes 13-15s on my small instance.

SELECT SQL_NO_CACHE `s`.`id`, `m`.`user_id`, `s`.`uid_owner`, `s`.`uid_initiator`, `s`.`share_with`, `s`.`file_target` FROM `oc_share` `s` INNER JOIN `oc_filecache` `f` ON `s`.`item_source` = `f`.`fileid` INNER JOIN `oc_mounts` `m` ON `f`.`storage` = `m`.`storage_id` WHERE (`m`.`user_id` <> `s`.`uid_owner`) AND (CONCAT('/', `m`.`user_id`, '/') = `m`.`mount_point`); Empty set (0,002 sec) takes 0.01s on my small instance.

sorbaugh commented 3 weeks ago

Discussing with @icewind1991 it seems the cast was put in place because postgresql isn't as forgiving as MariaDB when comparing strings to integers.

An easy approach would be to add a condition to the code so it only uses the cast for postgresql, otherwise use a bare query.

ChristophWurst commented 3 weeks ago

That will make the query perform poorly on postgres still. Are there any other ways to restructure it into more efficient execution plans?

artonge commented 2 weeks ago

Can't we cast the item_source column to int instead of casting the file_id column to string? @hamza221 you've been able to reproduce? Can you check whether this improves the situation?

ChristophWurst commented 2 weeks ago
MariaDB [nextcloud]> EXPLAIN
    -> SELECT `s`.`id`,
    ->        `m`.`user_id`,
    ->        `s`.`uid_owner`,
    ->        `s`.`uid_initiator`,
    ->        `s`.`share_with`,
    ->        `s`.`file_target`
    -> FROM `oc_share` `s`
    ->          INNER JOIN `oc_filecache` `f`
    ->                     ON `s`.`item_source` = CAST(`f`.`fileid` AS CHAR)
    ->          INNER JOIN `oc_mounts` `m` ON `f`.`storage` = `m`.`storage_id`
    -> WHERE (`m`.`user_id` <> `s`.`uid_owner`)
    ->   AND (CONCAT('/', `m`.`user_id`, '/') = `m`.`mount_point`);
+------+-------------+-------+------+-----------------------------------------------------------------------------------------------------+---------------------+---------+-----------------------+------+-------------------------------------------------+
| id   | select_type | table | type | possible_keys                                                                                       | key                 | key_len | ref                   | rows | Extra                                           |
+------+-------------+-------+------+-----------------------------------------------------------------------------------------------------+---------------------+---------+-----------------------+------+-------------------------------------------------+
|    1 | SIMPLE      | m     | ALL  | mounts_storage_index,mount_user_storage                                                             | NULL                | NULL    | NULL                  |   44 | Using where                                     |
|    1 | SIMPLE      | s     | ALL  | NULL                                                                                                | NULL                | NULL    | NULL                  |   78 | Using where; Using join buffer (flat, BNL join) |
|    1 | SIMPLE      | f     | ref  | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_storage_mimepart | 8       | nextcloud.m.storage_id | 3608 | Using where; Using index                        |
+------+-------------+-------+------+-----------------------------------------------------------------------------------------------------+---------------------+---------+-----------------------+------+-------------------------------------------------+
3 rows in set (0,001 sec)

MariaDB [nextcloud]> EXPLAIN
    -> SELECT `s`.`id`,
    ->        `m`.`user_id`,
    ->        `s`.`uid_owner`,
    ->        `s`.`uid_initiator`,
    ->        `s`.`share_with`,
    ->        `s`.`file_target`
    -> FROM `oc_share` `s`
    ->          INNER JOIN `oc_filecache` `f`
    ->                     ON `s`.`item_source` = `f`.`fileid`
    ->          INNER JOIN `oc_mounts` `m` ON `f`.`storage` = `m`.`storage_id`
    -> WHERE (`m`.`user_id` <> `s`.`uid_owner`)
    ->   AND (CONCAT('/', `m`.`user_id`, '/') = `m`.`mount_point`);
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------+------+-------------+
| id   | select_type | table | type   | possible_keys                                                                                                                  | key                  | key_len | ref                    | rows | Extra       |
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------+------+-------------+
|    1 | SIMPLE      | s     | ALL    | NULL                                                                                                                           | NULL                 | NULL    | NULL                   |   78 | Using where |
|    1 | SIMPLE      | f     | eq_ref | PRIMARY,fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_id_storage_size,fs_storage_path_prefix | PRIMARY              | 8       | nextcloud.s.item_source |    1 | Using where |
|    1 | SIMPLE      | m     | ref    | mounts_storage_index,mount_user_storage                                                                                        | mounts_storage_index | 8       | nextcloud.f.storage     |    2 | Using where |
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------+------+-------------+
3 rows in set (0,001 sec)

MariaDB [nextcloud]> EXPLAIN
    -> SELECT `s`.`id`,
    ->        `m`.`user_id`,
    ->        `s`.`uid_owner`,
    ->        `s`.`uid_initiator`,
    ->        `s`.`share_with`,
    ->        `s`.`file_target`
    -> FROM `oc_share` `s`
    ->          INNER JOIN `oc_filecache` `f`
    ->                     ON CAST(`s`.`item_source` AS INT) = `f`.`fileid`
    ->          INNER JOIN `oc_mounts` `m` ON `f`.`storage` = `m`.`storage_id`
    -> WHERE (`m`.`user_id` <> `s`.`uid_owner`)
    ->   AND (CONCAT('/', `m`.`user_id`, '/') = `m`.`mount_point`);
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+--------------------+------+-------------+
| id   | select_type | table | type   | possible_keys                                                                                                                  | key                  | key_len | ref                | rows | Extra       |
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+--------------------+------+-------------+
|    1 | SIMPLE      | s     | ALL    | NULL                                                                                                                           | NULL                 | NULL    | NULL               |   78 |             |
|    1 | SIMPLE      | f     | eq_ref | PRIMARY,fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_id_storage_size,fs_storage_path_prefix | PRIMARY              | 8       | func               |    1 | Using where |
|    1 | SIMPLE      | m     | ref    | mounts_storage_index,mount_user_storage                                                                                        | mounts_storage_index | 8       | nextcloud.f.storage |    2 | Using where |
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+--------------------+------+-------------+
3 rows in set (0,001 sec)

it works too and allows the db to use the PRIMARY index

ChristophWurst commented 2 weeks ago

If that works with Postgres it's a really nice solution. The original query takes >3s on my personal instance. Without cast and with the changed cast it finishes in <=0.003s

Uneducated :+1: