nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.54k stars 4.08k forks source link

Performance degradation nextcloud 20 server mariadb ubuntu since Januar 22 or short later #25386

Open dirkpape opened 3 years ago

dirkpape commented 3 years ago

Steps to reproduce

running nextcloud (see version below). Having about 50 clients connecting by WEBDAV and CALDAV. look in slow_query.log.

The issue is new and possibly related to an upgrade to a new version (issue still persists in 20.0.6) or due to an upgrade of an app.

The nextcloud ran well and performant for years. The performance degradation started since about one week.

It seems the DB became a bottleneck. DB system is mariaDB on ubuntu 18 (version details below). After restarting mariaDB the symptoms are cured for some time.

We already tried some tunings of the DB (which ran well for years) without success.

Maybe related: We find some slow queries (between 8 and 100 seconds) of the pattern:

SELECT `a`.`name` FROM `oc_filecache` `a` LEFT JOIN `oc_filecache` `b` ON `a`.`name` = `b`.`fileid` WHERE (`b`.`fileid` IS NULL) AND (`a`.`path` LIKE 'appdata\\_oc2rk8c8h02y/preview/_/_/_/_/_/_/_/%') AND (`a`.`mimetype` = '2');

Regards, Dirk

Expected behaviour

Normal Performance while serving 50 clients

Actual behaviour

After some time performance becomes very bad. restart of mysqld helps for some time (mariadb, ubuntu 18)

Server configuration detail

Operating system: Linux 4.15.0-135-generic #139-Ubuntu SMP Mon Jan 18 17:38:24 UTC 2021 x86_64

Webserver: Apache (fpm-fcgi)

Database: mysql 10.1.47

PHP version:

7.4.14 Modules loaded: Core, date, libxml, openssl, pcre, zlib, bz2, calendar, ctype, hash, filter, ftp, gettext, gmp, SPL, iconv, Reflection, session, standard, SimpleXML, sockets, mbstring, tokenizer, xml, cgi-fcgi, mysqlnd, bcmath, curl, dba, dom, enchant, fileinfo, gd, imagick, imap, intl, json, ldap, exif, mysqli, odbc, PDO, pdo_mysql, PDO_ODBC, pdo_pgsql, pdo_sqlite, pgsql, Phar, posix, pspell, redis, soap, sodium, sqlite3, sysvmsg, sysvsem, sysvshm, tidy, xmlreader, xmlrpc, xmlwriter, xsl, zip, ionCube Loader, Zend OPcache

Nextcloud version: 20.0.6 - 20.0.6.1

Updated from an older Nextcloud/ownCloud or fresh install:

Where did you install Nextcloud from: unknown

Signing status Array ( )
List of activated apps ``` Enabled: - accessibility: 1.6.0 - activity: 2.13.4 - analytics: 3.2.0 - announcementcenter: 4.0.0 - apporder: 0.11.0 - bookmarks: 4.0.8 - bruteforcesettings: 2.0.1 - calendar: 2.1.3 - cloud_federation_api: 1.3.0 - comments: 1.10.0 - contacts: 3.4.3 - contactsinteraction: 1.1.0 - dashboard: 7.0.0 - data_request: 1.7.0 - dav: 1.16.2 - drawio: 0.9.8 - dropit: 0.4.0 - emlviewer: 0.0.21 - federatedfilesharing: 1.10.2 - federation: 1.10.1 - files: 1.15.0 - files_accesscontrol: 1.10.2 - files_automatedtagging: 1.10.1 - files_downloadactivity: 1.9.0 - files_external: 1.11.1 - files_fulltextsearch: 20.0.0 - files_fulltextsearch_tesseract: 20.0.1 - files_pdfviewer: 2.0.1 - files_readmemd: 1.2.0 - files_rightclick: 0.17.0 - files_sharing: 1.12.2 - files_trackdownloads: 1.9.0 - files_trashbin: 1.10.1 - files_versions: 1.13.0 - files_videoplayer: 1.9.0 - firstrunwizard: 2.9.0 - fulltextsearch: 20.0.0 - fulltextsearch_elasticsearch: 20.0.0 - groupfolders: 8.2.0 - impersonate: 1.7.0 - issuetemplate: 0.7.0 - keeweb: 0.6.4 - logreader: 2.5.0 - lookup_server_connector: 1.8.0 - mail: 1.8.0 - news: 15.1.1 - nextcloud_announcements: 1.9.0 - notes: 4.0.2 - notifications: 2.8.0 - oauth2: 1.8.0 - onlyoffice: 6.2.0 - password_policy: 1.10.1 - photos: 1.2.3 - polls: 1.6.3 - privacy: 1.4.0 - provisioning_api: 1.10.0 - quota_warning: 1.9.1 - ransomware_detection: 0.10.0 - ransomware_protection: 1.8.0 - recommendations: 0.8.0 - serverinfo: 1.10.0 - settings: 1.2.0 - sharebymail: 1.10.0 - sharerenamer: 2.7.3 - support: 1.3.0 - systemtags: 1.10.0 - tasks: 0.13.6 - terms_of_service: 1.6.1 - text: 3.1.0 - theming: 1.11.0 - twofactor_backupcodes: 1.9.0 - updatenotification: 1.10.0 - user_status: 1.0.1 - viewer: 1.4.0 - weather_status: 1.0.0 - workflow_pdf_converter: 1.5.1 - workflow_script: 1.5.1 - workflowengine: 2.2.0 Disabled: - admin_audit - encryption - survey_client - user_ldap ```
Configuration (config/config.php) ``` { "instanceid": "***REMOVED SENSITIVE VALUE***", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "oc.pape-und-co.de" ], "datadirectory": "***REMOVED SENSITIVE VALUE***", "version": "20.0.6.1", "dbtype": "mysql", "dbhost": "***REMOVED SENSITIVE VALUE***", "dbname": "***REMOVED SENSITIVE VALUE***", "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "dbtableprefix": "oc_", "installed": true, "default_language": "de_DE", "defaultapp": "files", "mail_domain": "***REMOVED SENSITIVE VALUE***", "mail_from_address": "***REMOVED SENSITIVE VALUE***", "mail_smtpmode": "sendmail", "mail_smtphost": "***REMOVED SENSITIVE VALUE***", "mail_smtpport": "25", "mail_smtpauthtype": "LOGIN", "overwritehost": "oc.pape-und-co.de", "overwriteprotocol": "https", "overwrite.cli.url": "https:\/\/oc.pape-und-co.de", "htaccess.RewriteBase": "\/", "trashbin_retention_obligation": "auto", "updater.release.channel": "stable", "loglevel": 0, "log_rotate_size": 2097152, "enable_previews": true, "maintenance": false, "memcache.local": "\\OC\\Memcache\\Redis", "memcache.distributed": "\\OC\\Memcache\\Redis", "redis": { "host": "***REMOVED SENSITIVE VALUE***", "port": 6379, "timeout": 0, "dbindex": 0 }, "mysql.utf8mb4": true, "tempdirectory": "\/var\/www\/vhosts\/oc.pape-und-co.de\/oc.pape-und-co.de\/owncloud-temp", "secret": "***REMOVED SENSITIVE VALUE***", "filelocking.enabled": "true", "memcache.locking": "\\OC\\Memcache\\Redis", "debug": false, "appstore.experimental.enabled": true, "asset-pipeline.enabled": false, "app_install_overwrite": [ "sharerenamer", "dropit", "keeweb", "files_readmemd", "emlviewer", "drop_account" ], "enabledPreviewProviders": [ "OC\\Preview\\PNG", "OC\\Preview\\JPEG", "OC\\Preview\\GIF", "OC\\Preview\\HEIC", "OC\\Preview\\BMP", "OC\\Preview\\XBitmap", "OC\\Preview\\MP3", "OC\\Preview\\TXT", "OC\\Preview\\MarkDown", "OC\\Preview\\SVG" ], "lost_password_link": "disabled", "custom_csp_policy": "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; frame-src *; img-src *; font-src 'self' data:; media-src *", "encryption.legacy_format_support": false, "encryption.key_storage_migrated": false, "updater.secret": "***REMOVED SENSITIVE VALUE***" } ```

Are you using external storage, if yes which one: local/smb/sftp/...

Are you using encryption:

Are you using an external user-backend, if yes which one: LDAP/ActiveDirectory/Webdav/...

Client configuration

Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.2 Safari/605.1.15

Operating system:

Logs

Web server error log ``` Insert your web server log here ```
Nextcloud log ``` Insert your Nextcloud log here ```
Browser log Insert your browser log here, this could for example include: a) The javascript console log b) The network log c) ...
brtptrs commented 3 years ago

We've also noticed a big impact upgrading from 19 to 20. Loading the dashboard and doing searches with the new integrated search are quite slow. Upgrading to 20.0.5 seems to have improved things slightly (not sure because we have added ram and cpus) but the load on the system (especially mariadb) is still significantly higher than on v19.x.

putt1ck commented 3 years ago

Worth checking if the performance issue is all search related. Our observations showed performance was fine when no searches had been run, and that load climbed rapidly with a search; better still, if a user started a search, gets frustrated at getting no results and changes the search, the old search query is still running in the background, so now there's 2 searches causing load from one user. Add in 400 users and... Load of 60 on the DB server anyone? The search queries eventually time out (often without producing useful results, and pointlessly in the cases of the ones where the user has moved on) and if no other searches are initiated loads return to normal (on the DB server this means sub 2 on a 4 core machine).

dossantg commented 3 years ago

We have the same issue on nextcloud 19.0.3

# Thread_id: 5849848  Schema: nextcloud  QC_hit: No
# Query_time: 125.023464  Lock_time: 0.000036  Rows_sent: 2  Rows_examined: 37165721
# Rows_affected: 0  Bytes_sent: 105
SET timestamp=1613480530;
SELECT `a`.`name` FROM `oc_filecache` `a` LEFT JOIN `oc_filecache` `b` ON `a`.`name` = `b`.`fileid` WHERE (`b`.`fileid` IS NULL) AND (`a`.`path` LIKE 'appdata\\_{nextcloud_id}/preview/_/_/_/_/_/_/_/%') AND (`a`.`mimetype` = '2');
szaimen commented 3 years ago

Is this Issue still valid in NC21.0.2? If not, please close this issue. Thanks! :)

Fabyao commented 3 years ago

@szaimen I have a fresh installation of NC21.0.2 and find the loading time to be slow. From "Chrome dev tools", navigating between menus ("Files", "Dashboard", "Talk" etc..) takes about 8s to 15s. The "Dashboard" seems to be more affected than others. It shows a blank page for few seconds. I am using postgres as a database and have a redis cache instance. Our VPS server is a 6 core AMD with 16GB Ram. What is considered an "acceptable" loading time in NC?

szaimen commented 3 years ago

@Fabyao Do you use external storages?

Fabyao commented 3 years ago

No, we dont use an external S3 storage. The files a stored locally

szaimen commented 3 years ago

This looks then like a different issue. See https://github.com/nextcloud/server/issues/24641

ghost commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

ghost commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

ruedigerkupper commented 3 years ago

Have you set up a memory cache like redis?

J0WI commented 3 years ago

I can confirm that this slow query occurs with Redis and PostgreSQL too.

kesselb commented 1 year ago

Hi :wave:

SELECT `a`.`name` FROM `oc_filecache` `a` LEFT JOIN `oc_filecache` `b` ON `a`.`name` = `b`.`fileid` WHERE (`b`.`fileid` IS NULL) AND (`a`.`path` LIKE 'appdata\\_oc2rk8c8h02y/preview/_/_/_/_/_/_/_/%') AND (`a`.`mimetype` = '2');

The query above belongs to a background job to delete previews. https://github.com/nextcloud/server/pull/35844/ for more details about the query.

https://github.com/nextcloud/server/pull/28610 optimized the query to use the storage_id index.

How is the situation today?

J0WI commented 1 year ago

still producing a lot of slow query logs. The update query is even worse.

nextcloud-command commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity and seems to be missing some essential information. It will be closed if no further activity occurs. Thank you for your contributions.

szaimen commented 1 year ago

Hi, please update to 24.0.9 or better 25.0.3 and report back if it fixes the issue. Thank you!

My goal is to add a label like e.g. 25-feedback to this ticket of an up-to-date major Nextcloud version where the bug could be reproduced. However this is not going to work without your help. So thanks for all your effort!

If you don't manage to reproduce the issue in time and the issue gets closed but you can reproduce the issue afterwards, feel free to create a new bug report with up-to-date information by following this link: https://github.com/nextcloud/server/issues/new?assignees=&labels=bug%2C0.+Needs+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+

J0WI commented 1 year ago

https://github.com/nextcloud/server/issues/25386#issuecomment-1364370550 should be recent enough

szaimen commented 1 year ago

okay, so 25. Thanks!

solracsf commented 1 year ago

Situation remains the same on v27.

# Time: 231020 17:28:38
# User@Host: nextcloud[nextcloud] @ localhost []
# Thread_id: 825958  Schema: nextcloud  QC_hit: No
# Query_time: 86.054387  Lock_time: 0.000064  Rows_sent: 13  Rows_examined: 500269
# Rows_affected: 0  Bytes_sent: 248
#
# explain: id   select_type     table   type    possible_keys   key     key_len ref     rows    r_rows  filtered        r_filtered   Extra
# explain: 1    SIMPLE  a       ref     fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix  fs_storage_path_prefix  8       const   935297  500269.00       37.06   15.31   Using where
# explain: 1    SIMPLE  b       eq_ref  PRIMARY,fs_id_storage_size      PRIMARY 8       nextcloud.a.name        1       1.00100.00   100.00  Using where; Using index; Not exists
#
SET timestamp=1697815718;
SELECT `a`.`name` FROM `oc_filecache` `a` LEFT JOIN `oc_filecache` `b` ON `a`.`name` = `b`.`fileid` WHERE (`a`.`storage` = '2279') AND (`b`.`fileid` IS NULL) AND (`a`.`path` LIKE 'appdata\\_oc6xz1h8l8n2/preview/_/_/_/_/_/_/_/%') AND (`a`.`mimetype` = '2');
MariaDB [nextcloud]> EXPLAIN SELECT `a`.`name` FROM `oc_filecache` `a` LEFT JOIN `oc_filecache` `b` ON `a`.`name` = `b`.`fileid` WHERE (`a`.`storage` = '2279') AND (`b`.`fileid` IS NULL) AND (`a`.`path` LIKE 'appdata\\_oc6xz1h8l8n2/preview/_/_/_/_/_/_/_/%') AND (`a`.`mimetype` = '2');
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------------------+--------+--------------------------------------+
| id   | select_type | table | type   | possible_keys                                                                                       | key                    | key_len | ref              | rows   | Extra                                |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------------------+--------+--------------------------------------+
|    1 | SIMPLE      | a     | ref    | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_storage_path_prefix | 8       | const            | 935163 | Using where                          |
|    1 | SIMPLE      | b     | eq_ref | PRIMARY,fs_id_storage_size                                                                          | PRIMARY                | 8       | nextcloud.a.name | 1      | Using where; Using index; Not exists |
+------+-------------+-------+--------+-----------------------------------------------------------------------------------------------------+------------------------+---------+------------------+--------+--------------------------------------+
2 rows in set (0.011 sec)
jospoortvliet commented 1 year ago

This is the same as https://github.com/nextcloud/server/issues/28114 - but that one was closed with some fixes. @juliushaertl @solracsf perhaps you two should talk, if you have a clue what might cause this? Is it only relevant on very large installations? The initial report was on a system with 50 users, I assume this is resolved with the fixes Julius did... Are we sure it's still the same problem?

juliusknorr commented 1 year ago

The query is still not super performant bug on larger instances that I've seen reasonably fast for a background job from my perspective. @solracsf How big is your instance in terms of active users and how many rows does oc_filecache have?

At least with the current approach of finding previews that have no matching file id I don't see any further potential to optimize the query.

We cannot delete the previews right away when deleting as with larger folders this is very prone to timeout errors (ref https://github.com/nextcloud/server/commit/3e07c4f73a80bfa7e5de5a9e0074d2c7d35f18cc)

The only other approach I could imagine would be to schedule individual cleanup jobs for the file id on, but I'm also unsure about potential performance overhead there if we tackle each file individually with a delete query.

solracsf commented 1 year ago

@juliushaertl 1100 users.

SELECT COUNT(*) FROM oc_filecache;
+----------+
| COUNT(*) |
+----------+
|  3560945 |
+----------+
juliusknorr commented 1 year ago

Hm on a similarly sized oc_filecache table this query only takes ~3 seconds with even 1.3 million rows in the first query part.

MariaDB [oc]> SELECT `a`.`name` FROM `oc_filecache` `a` LEFT JOIN `oc_filecache` `b` ON `a`.`name` = `b`.`fileid` WHERE (`a`.`storage` = '2') AND (`b`.`fileid` IS NULL) AND (`a`.`path` LIKE 'appdata\\_ocabcdefg/preview/_/_/_/_/_/_/_/%') AND (`a`.`mimetype` = '2');
Empty set (3.058 sec)

MariaDB [oc]> explain SELECT `a`.`name` FROM `oc_filecache` `a` LEFT JOIN `oc_filecache` `b` ON `a`.`name` = `b`.`fil
eid` WHERE (`a`.`storage` = '2') AND (`b`.`fileid` IS NULL) AND (`a`.`path` LIKE 'appdata\\_ocabcdefg/preview/_/_/_/_/_/_/_/%') AND (`a`.`mimetype` = '2');
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------+------------------------+---------+-----------+---------+--------------------------------------+
| id   | select_type | table | type   | possible_keys                                                                                                      | key                    | key_len | ref       | rows    | Extra                                |
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------+------------------------+---------+-----------+---------+--------------------------------------+
|    1 | SIMPLE      | a     | ref    | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,store_mimepart,fs_storage_path_prefix | fs_storage_path_prefix | 8       | const     | 1325931 | Using where                          |
|    1 | SIMPLE      | b     | eq_ref | PRIMARY,fs_id_storage_size                                                                                         | PRIMARY                | 8       | oc.a.name | 1       | Using where; Using index; Not exists |
+------+-------------+-------+--------+--------------------------------------------------------------------------------------------------------------------+------------------------+---------+-----------+---------+--------------------------------------+
solracsf commented 1 year ago

Does your DB set fit the buffer pool? In my case it doesn't. Also, query cache is OFF.

So this can be a server size/tuning problem, not code related.

juliusknorr commented 1 year ago

Any hint on how to check that? I only have limited DB admin knowledge 🙈

solracsf commented 1 year ago

MySQLTuner script can print it (last line here). https://github.com/major/MySQLTuner-perl

-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
    Run ALTER TABLE ... FORCE or OPTIMIZE TABLE to defragment tables for better performance
      ALTER TABLE `nextcloud`.`oc_activity` FORCE; -- can free 157.5 MiB
    Total freed space after defragmentation: 157.5 MiB
    We will suggest raising the 'join_buffer_size' until JOINs not using indexes are found.
             See https://dev.mysql.com/doc/internals/en/join-buffer-size.html
             (specially the conclusions at the bottom of the page).
Variables to adjust:
    join_buffer_size (> 256.0K, or always use indexes with JOINs)
    innodb_buffer_pool_size (>= 10.6G) if possible.

Otherwise, if not a prod env, you can lower the setting innodb_buffer_pool_size to a value like 64M and try the query again (mysql needs a restart to alter that setting).

govbetrieb commented 5 months ago

SELECT COUNT() FROM oc_filecache; +----------+ | COUNT() | +----------+ | 3560945 | +----------+

SELECT COUNT() FROM oc_filecache; +----------+ | COUNT() | +----------+ | 2072489 | +----------+

I can confirm, after using mysqltuner, the load is stable at ~1. Before it was rising every day due to the cronjob.

THANKS!