nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.62k stars 3.99k forks source link

nextcloud very slow login via browser with high I/O load on the DB server #14632

Closed JHirte closed 4 years ago

JHirte commented 5 years ago

Steps to reproduce

  1. try login to the nextcloud instance via browser

Expected behaviour

Tell us what should happen

login should happen immediately

Actual behaviour

Tell us what happens instead

The page loads for long time (minutes to tens of minutes), until the login happens eventually. When login is entered, I/O load is going up on the db-server. I have 1-3 mysqld-processes, writing permanently to disk. This stops as soon as the login has finished. After the first login, logging out and re-login isn't as slow as before. It still takes several seconds, but not minutes like the first one.

Apps are not affected. Android and Linux-App are working fine. This only happen when login via browser.

Server configuration

Operating system: Gentoo Linux

Web server: apache-2.4.38-r1

Database: MariaDB-10.1.37 (separate machine)

PHP version: 7.2.14

Nextcloud version: (see Nextcloud admin page) 15.0.4

Updated from an older Nextcloud/ownCloud or fresh install: updated

kesselb commented 5 years ago

https://github.com/nextcloud/server/issues/12722 https://github.com/nextcloud/server/issues/9243

Anything in the logs? I noticed a few reports with Gentoo Linux. Not sure if this is related somehow but they are also about high cpu load.

JHirte commented 5 years ago

12722

9243

Anything in the logs? I noticed a few reports with Gentoo Linux. Not sure if this is related somehow but they are also about high cpu load.

No, there is nothing. The webserver idles during this time. It's the db-server that is writing permanently.

JHirte commented 5 years ago

Did some investigations: The permanent write activity comes from queries to oc_filecache. Most of them are updates and some deletes. Such updates look like:

UPDATE oc_filecache SET size = size + '94' WHERE (storage = 9) AND (path_hash IN ('d41d8cd98f00b204e9800998ecf8427e', '568f6f8128606539d139e496fef1331f', '6e41e5d6b5c692b409450fa455880bc4', '40fde4791249115bac42ae449336d4ab')) AND (size > '-1')

or

UPDATE oc_filecache SET mtime = '1552565281', size = '1125', etag = 'df2da6b03fe216d071b8fcf064f2d6e8', storage_mtime = '1552565281', checksum='' WHERE (mtime <> '1552565281' OR size <> '1125' OR etag <> 'df2da6b03fe216d071b8fcf064f2d6e8' OR storage_mtime <> '1552565281' OR checksum <> '' OR mtime IS NULL OR size IS NULL OR etag IS NULL OR storage_mtime IS NULL OR checksum IS NULL) AND fileid = '43754'

None of these are running long, but there are so much that this holds the login for several minutes.

kesselb commented 5 years ago

https://docs.nextcloud.com/server/stable/admin_manual/configuration_server/background_jobs_configuration.html#cron

Best guess: You are the only one using your nextcloud instance and login only a few times a week. The ajax based cron is triggered and there is a lot of jobs to do. I would try a regular cron.

JHirte commented 5 years ago

The cronjob is already set up and running every 15 minutes (confirmed by log). Additionally, even with ajax based cron, this shouldn't happen so soon again. After login succeeded eventually, I logout, wait ~1h and everything starts again. Login takes very long and db server shows the many queries to oc_filecache.

AInteriorB commented 5 years ago

Exact same behaviour here. Even if you are logged in and you access nextcloud after a while (maybe >1h), there's a huge load due to oc_filecache updates and commits on the database as described by @JHirte . Further requests are fast.

cron configured: /15 www-data php -f /var/www/nextcloud/cron.php select count() from oc_filecache: 218935 OS: Debian 10 Buster Nexcloud version: 15.0.7 Webserver: Apache/2.4.38 PHP: 7.3.3-1 (via php-fpm mod_proxy) MariaDB: 10.3.13

Queries: UPDATE oc_filecache SET mtime = GREATEST(mtime, 1555051760), etag = '5cb034f05a902' WHERE ... UPDATE oc_filecache SET size = size + '504' WHERE (storage = 41) AND (path_hash IN ('d41... INSERT INTO oc_filecache (mimepart, mimetype, mtime, size, etag, storage_mtime, ... UPDATE oc_filecache SET mtime = GREATEST(mtime, 1555051762), etag = '5cb034f21... ...

creopard commented 4 years ago

The official ducumentation suggests to run the cron 5 minutes instad of 15 minutes, see https://github.com/nextcloud/documentation/issues/1587

However this won't have any effect on this issue. The long loading times on initial load persists.

kesselb commented 4 years ago

@skjnldsv 1. to develop?

skjnldsv commented 4 years ago

@kesselb this is a confirmed bug, right?

kesselb commented 4 years ago

If you say so ;) I'm going to mark our conversation as off topic then.

skjnldsv commented 4 years ago

If you say so ;) I'm going to mark our conversation as off topic then.

That was a question :p I have no idea! ^^

kesselb commented 4 years ago

OK. On login we trigger many queries for oc_filecache. This looks like external storage with user credentials. But we can't say for sure? Because half of the issue template is missing. So this would be the first step? Adding config export, app list, are you using external storage and more logs.

AInteriorB commented 4 years ago

In my described case there is no external storage just a local nextclound instance with a huge amount of files.

JHirte commented 4 years ago

So what essential information is missing?

eonpatapon commented 4 years ago

I also have the same issue, no external storage, lots of files and a few users. Cron is setup properly.

skjnldsv commented 4 years ago

So what essential information is missing?

If anyone can reproduce the bug reliably :)

Please everyone, update to 17 and see if it still happens! Make sure you have everything green in your instance health checks (especially the db indices)

skjnldsv commented 4 years ago

As there is no feedback since a while I will close this ticket. If this is still happening please make sure to upgrade to the latest version. After that, feel free to reopen.

AInteriorB commented 3 years ago

@skjnldsv , I can confirm that this issue still persists with current NC Version 20.0.2. Lots of database operations are performed:

DELETE FROM oc_preferences WHERE userid = 'xxx' AND appid = 'login_token' AND configkey... ... UPDATE oc_filecache SET mtime = '1607250513', storage_mtime = '1607250513' WHERE (fileid = 5... ... INSERT INTO oc_filecache (mimepart, mimetype, mtime, size, etag, storage_mtime, permi... ... UPDATE oc_filecache SET mtime = '1607250514', etag = '4e33fe129511025f1d81ed17e0XXX', stor... ... INSERT INTO oc_filecache (mimepart, mimetype, mtime, size, etag, storage_mtime, perm...

We are experiencing this behaviour with auto login on browsers. After accessing the website after a longer time (several hours), the first request takes up to 15 seconds on a nextcloud instance with many files on a high end server. Most I/O happens due to many database requests/updates. Cron activated. Redis for locking. Mariadb Innodb tuned. APCu for local memcache. APCu also enabled for cli.

Can't the oc_filecache operations be handeled by the cronjob to reduce database access on actual requests?

creopard commented 3 years ago

Experienced the same behaviour as @AInteriorB on a similar setup. It won't help much for fixing this issue, but the "first request" response time got significantly better after switching to SSD hard drives for the file system and the database. Though it seems that switching to SSD for the filesystem in the first place has a greater impact.

AInteriorB commented 3 years ago

However, the "first request" response time got significantly better after switching to SSD hard drives for the file system and the database. Though it seems that switching to SSD for the filesystem in the first place has a greater impact.

Sure, SSDs improve database operations a lot. But it would be better to avoid them already at the "first request" and let the cron doing this job.

creopard commented 3 years ago

As an initial workaround you could also set up a wget cron that would request the NC frontpage periodically (within minutes) and see if that helps at least a bit. Sure, avoiding these requests in the first place would be the desired solution.

skjnldsv commented 3 years ago

Do you both have a proper memcache/redis config running?

AInteriorB commented 3 years ago

Do you both have a proper memcache/redis config running?

Yes, see 14632#issuecomment-739484501

I enabled the mariadb general_log this morning, to capture every request that is arriving at the database server. The summery just for the first request to the NC web backend:

Then lots of work on oc_authtoken:

Additionally 20 SELECT FROM oc_bruteforce_attempts. This table has lots of very old time stamps. I think old entries can be deleted.

oc_filecache has 272449 rows.

Maybe it's just the sum that leads to a waiting time of 4 to 10 seconds for the first request.

adrianbj commented 3 years ago

Similar issues here. cron, redis etc set up and initial page load is close to a minute. I wonder if this is also related to why the search box is also incredibly slow: https://help.nextcloud.com/t/file-search-is-slow/34846/6

javafrog commented 2 years ago

I do still have this issue with version 22, initial login takes 70 seconds, sometimes longer. After that, the system is responsive. During login, there is a high load on the hard disk.

SharmamkIII commented 2 years ago

Hi I am using version 23 and latest social login app to login via keycloak IDP. First login takes around 35 sec but all the subsequent login takes around 1-2 second. System is configured Properly as per nextcloud documentation. Does anyone else facing same issue?

politas commented 2 years ago

I'm not sure how to check the database activity, but every single login on my Nextcloud instance takes around a minute or longer, which has required me to increase the fastcgi_timeout on my nginx config in order to stop them timing out. This has happened consistently since I first installed Nextcloud a few years ago through every version as I've upgraded. My openldap account back-end responds instantaneously. I'm happy to do any troubleshooting steps requested to try to solve this issue. @skjnldsv , can you please re-open this issue?

charlie5610 commented 2 years ago

This is happening to me as well, for a few months now through multiple updates. Not sure what to do, seems I have tried all the suggestions.

evilaliv3 commented 1 year ago

I wonder if this could be related to the anti bruteforce mechanism.

Time to time i get this result not only at login time but also at the first time i start reusing my browser. The first request stuck ~25-50 seconds that is somehow the time declared in the antibruteforce documentation: https://github.com/nextcloud/all-in-one/discussions/1300