nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.26k stars 3.96k forks source link

[Bug]: Restoring multiple big folders from trash bin fails #43759

Open mgallien opened 5 months ago

mgallien commented 5 months ago

⚠️ This issue respects the following points: ⚠️

Bug description

Restoring multiple big folders at once from trash bin fails with an error message: "Failed to restore some elements"

Steps to reproduce

  1. create 4 folders named bigFolder1, bigFolder2, bigFolder3 and bigFolder4
  2. create many empty files inside each folder for i in {1..10000}; do touch test${i}.txt; done
  3. get the files scan to finish sudo -u www-data php ./occ files:scan <uid>
  4. delete the 4 folders from web UI
  5. try to restore them with trash bin web UI

Expected behavior

all folders are restored without error

Installation method

Community Manual installation with Archive

Nextcloud Server version

master

Operating system

Debian/Ubuntu

PHP engine version

PHP 8.2

Web server

Apache (supported)

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

None

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Configuration report

sudo -u www-data php occ config:list system 
{
    "system": {
        "debug": true,
        "profiler": true,
        "apps_paths": [
            {
                "path": "\/var\/www\/html\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/www\/html\/apps-extra",
                "url": "\/apps-extra",
                "writable": false
            },
            {
                "path": "\/var\/www\/html\/apps-shared",
                "url": "\/apps-shared",
                "writable": false
            },
            {
                "path": "\/var\/www\/html\/apps-writable",
                "url": "\/apps-writable",
                "writable": true
            }
        ],
        "allow_local_remote_servers": true,
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpmode": "smtp",
        "mail_sendmailmode": "smtp",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "1025",
        "skeletondirectory": "\/skeleton",
        "setup_create_db_user": false,
        "loglevel": 0,
        "log_query": false,
        "query_log_file": "\/var\/www\/html\/data\/query.log",
        "query_log_file_requestid": "yes",
        "diagnostics.logging": true,
        "diagnostics.logging.threshold": 0,
        "log.condition": {
            "apps": [
                "diagnostics",
                "admin_audit"
            ]
        },
        "bulkupload.enabled": false,
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "localhost",
            "nextcloud",
            "nextcloud.local",
            "192.168.21.6",
            "localhost"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "29.0.0.5",
        "overwrite.cli.url": "http:\/\/nextcloud.local",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "updater.release.channel": "git",
        "installed": true,
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "lookup_server": "",
        "redis": {
            "host": "***REMOVED SENSITIVE VALUE***",
            "port": 6379
        },
        "memcache.local": "\\OC\\Memcache\\Redis",
        "memcache.locking": "\\OC\\Memcache\\Redis",
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***"
    }
}

List of activated Apps

sudo -u www-data php occ app:list
Enabled:
  - cloud_federation_api: 1.12.0
  - comments: 1.19.0
  - contactsinteraction: 1.10.0
  - dashboard: 7.9.0
  - dav: 1.30.0
  - federatedfilesharing: 1.19.0
  - federation: 1.19.0
  - files: 2.1.0
  - files_reminders: 1.2.0
  - files_sharing: 1.21.0
  - files_trashbin: 1.19.0
  - files_versions: 1.22.0
  - lookup_server_connector: 1.17.0
  - oauth2: 1.17.0
  - provisioning_api: 1.19.0
  - settings: 1.12.0
  - sharebymail: 1.19.0
  - systemtags: 1.19.0
  - theming: 2.4.0
  - twofactor_backupcodes: 1.18.0
  - updatenotification: 1.19.0
  - user_oidc: 1.3.6
  - user_status: 1.9.0
  - weather_status: 1.9.0
  - workflowengine: 2.11.0
Disabled:
  - admin_audit: 1.19.0
  - encryption: 2.17.0
  - files_external: 1.21.0
  - testing: 1.19.0
  - user_ldap: 1.20.0

Nextcloud Signing status

No response

Nextcloud Logs

No response

Additional info

No response

mgallien commented 5 months ago

@sorbaugh

sorbaugh commented 5 months ago

This reminds me a bit of a reported issue where an instance runs out of memory when doing operations on super large directories. This is very infrastructure/setup dependant, and It might be unrelated, but an idea then was to create some sort of admin setting to limit the number of allowed files in a directory, including the trashbin. While certainly not a satisfactory solution, we're bound to run into some kind of friction when dealing with absurdly large amounts of files/data.

Depending on what analysis of the root cause reveals, maybe this could be an alternative approach.

sorbaugh commented 5 months ago

Looping in @come-nc & @Altahrim

come-nc commented 4 months ago
["An exception occurred while executing a query: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction","/remote.php/dav/trashbin/bob/trash/bigFolder4.d1709033304"]
["An exception occurred while executing a query: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction","/remote.php/dav/trashbin/bob/trash/bigFolder3.d1709033304"]
["An exception occurred while executing a query: SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction","/remote.php/dav/trashbin/bob/trash/bigFolder1.d1709033304"]
["Transaction took 36.729679107666s","/remote.php/dav/trashbin/bob/trash/bigFolder2.d1709033304"]

Seems to be a concurrency problem.

come-nc commented 4 months ago

Info on the deadlock:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2024-02-27 13:10:47 0x7f031a5a6700
*** (1) TRANSACTION:
TRANSACTION 61551, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 80 lock struct(s), heap size 24696, 2950 row lock(s), undo log entries 2917
MariaDB thread id 2630, OS thread handle 139651303761664, query id 223713 192.168.21.6 root Updating
UPDATE `oc_filecache` SET `storage` = 3, `path_hash` = MD5(CONCAT('files/bigFolder2', SUBSTR(`path`, 44))), `path` = CONCAT('files/bigFolder2', SUBSTR(`path`, 44)) WHERE (`storage` = 3) AND (`path` LIKE 'files\\_trashbin/files/bigFolder2.d1709039279/%')
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10 page no 4456 n bits 304 index fs_storage_path_hash of table `nextcloud`.`oc_filecache` trx id 61551 lock mode S waiting
Record lock, heap no 221 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 8000000000000003; asc         ;;
 1: len 30; hex 343962373630373532383339653161373739323330383531303737393462; asc 49b760752839e1a77923085107794b; (total 32 bytes);
 2: len 8; hex 800000000000b768; asc        h;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10 page no 4456 n bits 304 index fs_storage_path_hash of table `nextcloud`.`oc_filecache` trx id 61550 lock_mode X locks rec but not gap
Record lock, heap no 221 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 8; hex 8000000000000003; asc         ;;
 1: len 30; hex 343962373630373532383339653161373739323330383531303737393462; asc 49b760752839e1a77923085107794b; (total 32 bytes);
 2: len 8; hex 800000000000b768; asc        h;;

*** (2) TRANSACTION:
TRANSACTION 61550, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 350 lock struct(s), heap size 57464, 2877 row lock(s), undo log entries 2825
MariaDB thread id 2629, OS thread handle 139651533489920, query id 223703 192.168.21.6 root Updating
UPDATE `oc_filecache` SET `storage` = 3, `path_hash` = MD5(CONCAT('files/bigFolder4', SUBSTR(`path`, 44))), `path` = CONCAT('files/bigFolder4', SUBSTR(`path`, 44)) WHERE (`storage` = 3) AND (`path` LIKE 'files\\_trashbin/files/bigFolder4.d1709039279/%')
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10 page no 321 n bits 304 index fs_storage_path_hash of table `nextcloud`.`oc_filecache` trx id 61550 lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 8; hex 8000000000000003; asc         ;;
 1: len 30; hex 333238343130336335663666633766326566306530636333303438333334; asc 3284103c5f6fc7f2ef0e0cc3048334; (total 32 bytes);
 2: len 8; hex 8000000000002a4a; asc       *J;;

*** CONFLICTING WITH:
RECORD LOCKS space id 10 page no 321 n bits 304 index fs_storage_path_hash of table `nextcloud`.`oc_filecache` trx id 61551 lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 32
 0: len 8; hex 8000000000000003; asc         ;;
 1: len 30; hex 333238343130336335663666633766326566306530636333303438333334; asc 3284103c5f6fc7f2ef0e0cc3048334; (total 32 bytes);
 2: len 8; hex 8000000000002a4a; asc       *J;;

*** WE ROLL BACK TRANSACTION (1)
come-nc commented 4 months ago

It’s not clear to me why the DELETE does not deadlock the same. It does thing in reverse order and update cache first but otherwise it’s the same kind of logic.

come-nc commented 4 months ago

Looking at the query log for both delete and restore, it looks quite similar, there is only one more UPDATE query on restore side, related to storage_mtime.

come-nc commented 4 months ago

When deleting the 4 folders, it seems the concurrency does not slow down deletion at all, each request takes less than 2 sec on my setup. For restoring, it take close to 5s, and when restoring the 4, 3 requests fail with deadlock in about 1s, and the fourth one finishes close to 5s as when alone.

come-nc commented 4 months ago

I tried https://github.com/nextcloud/server/pull/43531 but it does not avoid the deadlock.

come-nc commented 4 months ago

Current @Altahrim theory: