nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.26k stars 4.05k forks source link

[Bug]: Failed nextcloud upgrade resulting in table corruption #48906

Open joshqou opened 1 day ago

joshqou commented 1 day ago

⚠️ This issue respects the following points: ⚠️

Bug description

Upgraded from 28 to 30 (28 -> 29, 29 -> 30) and nextcloud seems to have mutelated several of it's tables.

The upgrade from 28 to 29 seems to have corrupted the tables as the logs show the errors after the upgrade.

{"reqId":"r9c551lpfUCXmcrKpc2x","level":1,"time":"2024-10-09T02:25:03+00:00","remoteAddr":"[removed]","user":"--","app":"updater","method":"GET","url":"/core/ajax/update.php?[removed]","message":"\\OC\\Updater::finishedCheckCodeIntegrity: Finished code integrity check","userAgent":"[removed]","version":"29.0.6.1","data":{"app":"updater"}}
{"reqId":"r9c551lpfUCXmcrKpc2x","level":1,"time":"2024-10-09T02:25:03+00:00","remoteAddr":"[removed]","user":"--","app":"updater","method":"GET","url":"/core/ajax/update.php?[removed]","message":"\\OC\\Updater::updateEnd: Update successful","userAgent":"[removed]","version":"29.0.7.1","data":{"app":"updater"}}
{"reqId":"r9c551lpfUCXmcrKpc2x","level":1,"time":"2024-10-09T02:25:03+00:00","remoteAddr":"[removed]","user":"--","app":"updater","method":"GET","url":"/core/ajax/update.php?[removed]","message":"\\OC\\Updater::maintenanceDisabled: Turned off maintenance mode","userAgent":"[removed]","version":"29.0.7.1","data":{"app":"updater"}}
{"reqId":"r9c551lpfUCXmcrKpc2x","level":1,"time":"2024-10-09T02:25:03+00:00","remoteAddr":"[removed]","user":"--","app":"updater","method":"GET","url":"/core/ajax/update.php?[removed]","message":"\\OC\\Updater::resetLogLevel: Reset log level to Warning(2)","userAgent":"[removed]","version":"29.0.7.1","data":{"app":"updater"}}
{"reqId":"wI8CKxakomwFu1ZeGpOB","level":3,"time":"2024-10-26T00:45:49+00:00","remoteAddr":"[removed]","user":"--","app":"index","method":"GET","url":"/apps/user_oidc/code?[removed]","message":"An exception occurred while executing a query: SQLSTATE[HY000]: General error: 1030 Got error 194 \"Tablespace is missing for a table\" from storage engine InnoDB","userAgent":"removed","version":"29.0.7.1","exception":{"Exception":"Doctrine\\DBAL\\Exception\\DriverException","Message":"An exception occurred while executing a query: SQLSTATE[HY000]: General error: 1030 Got error 194 \"Tablespace is missing for a table\" from storage engine InnoDB","Code":1030, ...

The tables mariadb can't read are as follows:

The tables are completely unreadable, can't even see their schemas using SHOW CREATE TABLE. Other tables in nextcloud's database & other databases seem to be fine.

All tables return the following from mariadb-check

Warning  : Tablespace is missing for table 'nextcloud/oc_notifications'
Error    : Got error 194 "Tablespace is missing for a table" from storage engine InnoDB
error    : Corrupt

I am able to login as a local admin account, however all of my oidc-connected accounts fail to login.

I have tried using innodb recovery however that didn't work, the tables seem to be completely dead.

Tried using occ maintenance:repair --include-expensive and occ db:add-missing-indices.

I have also looked at #46130 and nextcloud/text/5950 however the resolution is specific to another table being corrupted and doesn't explain how the actions taken works.

Given my instance is currently unusable, and nextcloud's documentation on such an event is basically just "restore from backup lol", I've got a few resolution options I can think of however I don't know how to go ahead with any of them.

Steps to reproduce

  1. Upgrade to Nextcloud 28 using the web updater
  2. ???
  3. ?????
  4. Nextcloud gets hungry and eats your tables

Expected behavior

n/a

Nextcloud Server version

30

Operating system

Other

PHP engine version

PHP 8.3

Web server

Nginx

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

Upgraded to a MAJOR version (ex. 28 to 29)

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Configuration report

$ ./occ config:list system
{
    "system": {
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "[removed]"
        ],
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "dbtype": "mysql",
        "version": "30.0.1.2",
        "overwrite.cli.url": "https:\/\/[removed]",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "mysql.utf8mb4": true,
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "installed": true,
        "defaultapp": "",
        "default_phone_region": "GB",
        "auth.webauthn.enabled": false,
        "lost_password_link": "disabled",
        "allow_local_remote_servers": true,
        "maintenance_window_start": 22,
        "mail_smtpmode": "smtp",
        "mail_sendmailmode": "smtp",
        "maintenance": false,
        "theme": "",
        "loglevel": 1,
        "app_install_overwrite": [
            "user_oidc"
        ]
    }
}

List of activated Apps

$ ./occ app:list
Enabled:
  - admin_audit: 1.20.0
  - announcementcenter: 7.0.1
  - app_api: 4.0.0
  - bookmarks: 15.0.2
  - bruteforcesettings: 3.0.0
  - calendar: 5.0.1
  - cloud_federation_api: 1.13.0
  - comments: 1.20.1
  - contacts: 6.1.0
  - contactsinteraction: 1.11.0
  - dav: 1.31.1
  - federatedfilesharing: 1.20.0
  - files: 2.2.0
  - files_downloadlimit: 3.0.0
  - files_pdfviewer: 3.0.0
  - files_reminders: 1.3.0
  - files_sharing: 1.22.0
  - files_trashbin: 1.20.1
  - files_versions: 1.23.0
  - logreader: 3.0.0
  - lookup_server_connector: 1.18.0
  - nextcloud_announcements: 2.0.0
  - notifications: 3.0.0
  - oauth2: 1.18.1
  - password_policy: 2.0.0
  - passwords: 2024.9.20
  - photos: 3.0.2
  - privacy: 2.0.0
  - provisioning_api: 1.20.0
  - recommendations: 3.0.0
  - related_resources: 1.5.0
  - serverinfo: 2.0.0
  - settings: 1.13.0
  - spreed: 20.0.1
  - systemtags: 1.20.0
  - tasks: 0.16.1
  - theming: 2.5.0
  - twofactor_backupcodes: 1.19.0
  - updatenotification: 1.20.0
  - user_oidc: 5.0.2
  - user_status: 1.10.0
  - viewer: 3.0.0
  - weather_status: 1.10.0
  - webhook_listeners: 1.1.0-dev
  - workflowengine: 2.12.0
Disabled:
  - activity: 3.0.0 (installed 3.0.0)
  - circles: 30.0.0-dev (installed 28.0.0)
  - dashboard: 7.10.0 (installed 7.10.0)
  - encryption: 2.18.0
  - federation: 1.20.0 (installed 1.18.0)
  - files_external: 1.22.0
  - firstrunwizard: 3.0.0 (installed 3.0.0)
  - sharebymail: 1.20.0 (installed 1.18.0)
  - support: 2.0.0 (installed 1.11.1)
  - survey_client: 2.0.0 (installed 1.16.0)
  - suspicious_login: 8.0.0
  - text: 4.1.0 (installed 4.1.0)
  - twofactor_nextcloud_notification: 4.0.0
  - twofactor_totp: 12.0.0-dev
  - user_ldap: 1.21.0 (installed 1.21.0)

Nextcloud Signing status

No errors have been found.

Nextcloud Logs

https://gist.github.com/joshqou/5095ccc3a1ea78c4c17c2ea7eba09542

Additional info

OS is Fedora 40 Server

joshtrichards commented 1 day ago

Tablespace is missing for table

This is an issue at the database level (i.e. with the underlying *.ibd files). This isn't something Nextcloud can cause nor, unfortunately, fix.

The resolution on that other issue wasn't really a "fix" per se. They dropped (deleted) the messed up table, re-added empty tables by re-running the db migrations, and crossed their fingers. Since it was Text session data, it was transient in nature so they presumably didn't care about the data itself (I'm surmising).

A restore though is the only way to get the data back.

In a production environment the appropriate solution is likely to be:

  1. Figure out why your database server is having problems like this
  2. Based on the results of the first item, decide upon a course of action (remedy root cause + restore, etc.)

Given my instance is currently unusable, and nextcloud's documentation on such an event is basically just "restore from backup lol", I've got a few resolution options I can think of however I don't know how to go ahead with any of them.

I'd suggest

If you're lucky it's a filesystem (OS level) permissions issue or something like. There are various things that can cause it, such as botched restores or moves, buggy MariaDB version, network storage problems, using with WSL, etc. You'll have to use your favorite search engine and poke around since we're not equipped to provide hands-on support through this channel (outside the scope of bugs in Nextcloud itself).

joshqou commented 1 day ago

This is an issue at the database level. This isn't something Nextcloud can cause nor, unfortunately, fix.

The resolution on that other issue wasn't really a "fix" per se. They dropped (deleted) the messed up table, re-added empty tables by re-running the db migrations, and crossed their fingers. Since it was Text session data, it was transient in nature so they presumably didn't care about the data itself (I'm surmising).

I'm aware, however I'd prefer to know what exactly the corrupted tables are used for before going ahead since Nextcloud is still partially functional. So completely wiping Nextcloud may not be necessary if the damaged tables aren't necessary or contain replaceable data.

If you're lucky it's a filesystem (OS level) permissions issue or something like. There are various things that can cause it, such as botched restores or moves, buggy MariaDB version, network storage problems, using with WSL, etc. You'll have to use your favorite search engine and poke around since we're not equipped to provide hands-on support through this channel (outside the scope of bugs in Nextcloud itself).

Sadly doesn't seem to be. All the permissions (unix & SELinux) seem to be fine and no other databases were effected. Nextcloud and mariadb are both using local storage on the same machine and it isn't the only database-heavy service using it. If it was a db-originated/hardware issue, other services would've probably been affected too

joshtrichards commented 1 day ago

I'd prefer to know what exactly the corrupted tables are used for before going ahead since Nextcloud is still partially functional [..] if the damaged tables aren't necessary or contain replaceable data

That's a gray area. However:

textprocessing_tasks is for queuing AI/Text processing tasks: https://github.com/search?q=repo%3Anextcloud%2Fserver%20textprocessing_tasks&type=code

accounts_data is used AFAIK for storing account related data in a more search friendly fashion. I'm not sure what will happen when this is out-of-sync with the accounts table: https://github.com/search?q=repo%3Anextcloud%2Fserver+accounts_data&type=code

notifications is pretty much what it sounds like: https://github.com/nextcloud/notifications/blob/master/lib/Migration/Version2004Date20190107135757.php

If it was a db-originated/hardware issue, other services would've probably been affected too

If only it were that simple. :)

joshqou commented 1 day ago

That's a gray area. However:

I'm aware. I've backed up the other readable tables in case trying to remake the broken ones breaks anything.

textprocessing_tasks is for queuing AI/Text processing tasks: https://github.com/search?q=repo%3Anextcloud%2Fserver%20textprocessing_tasks&type=code

accounts_data is used AFAIK for storing account related data in a more search friendly fashion. I'm not sure what will happen when this is out-of-sync with the accounts table: https://github.com/search?q=repo%3Anextcloud%2Fserver+accounts_data&type=code

notifications is pretty much what it sounds like: https://github.com/nextcloud/notifications/blob/master/lib/Migration/Version2004Date20190107135757.php

So textprocessing_tasks and notifications should behave properly if dropped and "migrated"?

Accounts data does seem a bit more important but it as you mentioned, it seems to duplicate data from the main accounts table. I'd have presumed that nextcloud would freshen up the accounts_data table if it was out of sync but I can't see any code which would do such. Nextcloud being functional despite the table being corrupt does give me some hope.

If only it were that simple. :)

🫠