nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.63k stars 3.99k forks source link

[Bug]: Extremely bad performance with WebDAV #32729

Closed jose1711 closed 1 year ago

jose1711 commented 2 years ago

⚠️ This issue respects the following points: ⚠️

Bug description

Mounting Nextcloud as a virtual WebDAV filesystem via fuse results in an extremely slow file access.

Steps to reproduce

  1. Run sudo mount -t davfs https://nextcloudhost/remote.php/dav/files/username mountpoint
  2. Authenticate
  3. Receive the following:
    /usr/bin/mount.davfs: warning: the server does not support locks
  4. Run time ls mountpoint
  5. Get the output
    ________________________________________________________
    Executed in   55.94 secs      fish           external
    usr time    1.89 millis  437.00 micros    1.45 millis
    sys time    0.07 millis   65.00 micros    0.00 millis

Expected behavior

Even with very modest hardware one should not really wait almost a full minute for a single directory listing (only a few items there).

Installation method

Manual installation

Operating system

Other

PHP engine version

PHP 8.0

Web server

Nginx

Database engine version

MariaDB

Is this bug present after an update or on a fresh install?

No response

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

What user-backends are you using?

Configuration report

{
    "system": {
        "datadirectory": "***REMOVED SENSITIVE VALUE***",
        "debug": false,
        "logfile": "\/var\/log\/nextcloud\/nextcloud.log",
        "log_type": "nextcloud",
        "apps_paths": [
            {
                "path": "\/usr\/share\/webapps\/nextcloud\/apps",
                "url": "\/apps",
                "writable": false
            },
            {
                "path": "\/var\/lib\/nextcloud\/apps",
                "url": "\/apps-appstore",
                "writable": true
            }
        ],
        "updatechecker": false,
        "check_for_working_htaccess": false,
        "installed": true,
        "instanceid": "***REMOVED SENSITIVE VALUE***",
        "passwordsalt": "***REMOVED SENSITIVE VALUE***",
        "secret": "***REMOVED SENSITIVE VALUE***",
        "trusted_domains": [
            "something.somedomain.com",
            "nextcloud",
            "192.168.0.119"
        ],
        "trusted_proxies": "***REMOVED SENSITIVE VALUE***",
        "overwrite.cli.url": "https:\/\/something.somedomain.com",
        "overwritewebroot": "\/",
        "overwritecondaddr": "^192\\.168\\.0\\.119$",
        "overwritehost": "something.somedomain.com",
        "overwriteprotocol": "https",
        "dbtype": "mysql",
        "version": "24.0.0.12",
        "dbname": "***REMOVED SENSITIVE VALUE***",
        "dbhost": "***REMOVED SENSITIVE VALUE***",
        "dbport": "",
        "dbtableprefix": "oc_",
        "dbuser": "***REMOVED SENSITIVE VALUE***",
        "dbpassword": "***REMOVED SENSITIVE VALUE***",
        "maintenance": false,
        "theme": "",
        "mail_smtpmode": "smtp",
        "mail_smtpauthtype": "LOGIN",
        "mail_sendmailmode": "smtp",
        "mail_from_address": "***REMOVED SENSITIVE VALUE***",
        "mail_domain": "***REMOVED SENSITIVE VALUE***",
        "mail_smtphost": "***REMOVED SENSITIVE VALUE***",
        "mail_smtpport": "465",
        "mail_smtpsecure": "ssl",
        "mail_smtpauth": 1,
        "mail_smtpname": "***REMOVED SENSITIVE VALUE***",
        "mail_smtppassword": "***REMOVED SENSITIVE VALUE***",
        "app_install_overwrite": [
            "geoblocker",
            "calendar"
        ],
        "loglevel": 2,
        "profiler": false
    }
}

List of activated Apps

Enabled:
  - accessibility: 1.10.0
  - activity: 2.16.0
  - cloud_federation_api: 1.7.0
  - comments: 1.14.0
  - contactsinteraction: 1.5.0
  - dav: 1.22.0
  - federatedfilesharing: 1.14.0
  - federation: 1.14.0
  - files: 1.19.0
  - files_sharing: 1.16.2
  - files_trashbin: 1.14.0
  - files_versions: 1.17.0
  - files_videoplayer: 1.13.0
  - geoblocker: 0.5.5
  - logreader: 2.9.0
  - lookup_server_connector: 1.12.0
  - notifications: 2.12.0
  - oauth2: 1.12.0
  - photos: 1.6.0
  - provisioning_api: 1.14.0
  - settings: 1.6.0
  - sharebymail: 1.14.0
  - support: 1.7.0
  - systemtags: 1.14.0
  - tables: 0.1.2
  - text: 3.5.1
  - twofactor_backupcodes: 1.13.0
  - viewer: 1.8.0
  - workflowengine: 2.6.0

Nextcloud Signing status

Integrity checker has been disabled. Integrity cannot be verified.

Nextcloud Logs

Nothing appeared in the log during mount and `ls` invocation.

Additional info

Most of the time ls is active php-fpm process is taking 100 % CPU. Further invocations are noticeably faster (e. g. 3 seconds) thanks to caching.

Response in browser session is OK. The only noticeable performance issue is with WebDAV mounts.

OS nextcloud (nginx+PHP): Alpine Linux inside LXC container OS database (mariadb): Debian Buster inside LXC container (same host as nextcloud host)

PVince81 commented 2 years ago

please check access log and the timings of each request to see if davfs is doing too many requests or if it's Nextcloud answering too slowly

jose1711 commented 2 years ago

access.log from initial ls /mountpoint run just after mounting:

192.168.0.105 - jose [17/Jun/2022:21:16:15 +0000] "PROPFIND /remote.php/dav/files/jose/ HTTP/1.1" 207 13363 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:18 +0000] "PROPFIND /remote.php/dav/files/jose/directory1/ HTTP/1.1" 207 1152 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:21 +0000] "PROPFIND /remote.php/dav/files/jose/directory2/ HTTP/1.1" 207 3191 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:24 +0000] "PROPFIND /remote.php/dav/files/jose/directory3/ HTTP/1.1" 207 1179 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.119 - jose [17/Jun/2022:21:16:25 +0000] "PROPFIND /remote.php/dav/files/jose/ HTTP/1.0" 207 249 "-" "Mozilla/5.0 (Linux) mirall/3.5.1git (Nextcloud, arch-5.18.3-arch1-1 ClientArchitecture: x86_64 OsArchitecture: x86_64)" "192.168.0.1"
192.168.0.105 - jose [17/Jun/2022:21:16:27 +0000] "PROPFIND /remote.php/dav/files/jose/directory4/ HTTP/1.1" 207 4203 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:30 +0000] "PROPFIND /remote.php/dav/files/jose/directory5/ HTTP/1.1" 207 2650 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:33 +0000] "PROPFIND /remote.php/dav/files/jose/directory6/ HTTP/1.1" 207 1154 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:36 +0000] "PROPFIND /remote.php/dav/files/jose/directory7/ HTTP/1.1" 207 10048 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:39 +0000] "PROPFIND /remote.php/dav/files/jose/directory8/ HTTP/1.1" 207 1170 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:42 +0000] "PROPFIND /remote.php/dav/files/jose/directory9/ HTTP/1.1" 207 22976 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:45 +0000] "PROPFIND /remote.php/dav/files/jose/directory10/ HTTP/1.1" 207 74821 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:48 +0000] "PROPFIND /remote.php/dav/files/jose/directory11/ HTTP/1.1" 207 1660 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:51 +0000] "PROPFIND /remote.php/dav/files/jose/directory12/ HTTP/1.1" 207 661 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:16:54 +0000] "PROPFIND /remote.php/dav/files/jose/directory13/ HTTP/1.1" 207 5709 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.119 - jose [17/Jun/2022:21:16:55 +0000] "GET /ocs/v2.php/apps/notifications/api/v2/notifications?format=json HTTP/1.0" 304 0 "-" "Mozilla/5.0 (Linux) mirall/3.5.1git (Nextcloud, arch-5.18.3-arch1-1 ClientArchitecture: x86_64 OsArchitecture: x86_64)" "192.168.0.1"
192.168.0.119 - jose [17/Jun/2022:21:16:55 +0000] "PROPFIND /remote.php/dav/files/jose/ HTTP/1.0" 207 249 "-" "Mozilla/5.0 (Linux) mirall/3.5.1git (Nextcloud, arch-5.18.3-arch1-1 ClientArchitecture: x86_64 OsArchitecture: x86_64)" "192.168.0.1"
192.168.0.105 - jose [17/Jun/2022:21:16:57 +0000] "PROPFIND /remote.php/dav/files/jose/directory14/ HTTP/1.1" 207 11247 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:17:00 +0000] "PROPFIND /remote.php/dav/files/jose/directory15/ HTTP/1.1" 207 3798 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:17:03 +0000] "PROPFIND /remote.php/dav/files/jose/directory16/ HTTP/1.1" 207 22772 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.105 - jose [17/Jun/2022:21:17:07 +0000] "PROPFIND /remote.php/dav/files/jose/directory17/ HTTP/1.1" 207 72535 "-" "davfs2/1.6.1 neon/0.32.2" "-"
192.168.0.119 - jose [17/Jun/2022:21:17:25 +0000] "PROPFIND /remote.php/dav/files/jose/ HTTP/1.0" 207 249 "-" "Mozilla/5.0 (Linux) mirall/3.5.1git (Nextcloud, arch-5.18.3-arch1-1 ClientArchitecture: x86_64 OsArchitecture: x86_64)" "192.168.0.1"

Looks like each and every PROPFIND takes 3 seconds to run and this is run on all 17 directories in basedir. That explains almost a minute delay needed to show the initial contents. To answer the original question: it looks like NC's answers are slow.

da-h commented 2 years ago

I have the exact same problem with a similar setup.

If there is something you would like someone to test, I would be happy to help to narrow down the problem further.

PVince81 commented 2 years ago

does that user have a lot of incoming shares / mount points ? those can sometimes slow down PROPFINDs because it would mount each and every one

I see you're already on 24 where this was optimized already, please try with 24.0.4 because 24.0.0 had some remaining issues related to FS setup

da-h commented 2 years ago

I am not sure what exactly is meant. In my case:

Regarding the version I update daily, thus I am running version: 24.0.4.1, where the problem still persists.

PVince81 commented 2 years ago

@da-h then everything should be fine, it can't be related to the shares.

please disable all third party apps in case some are messing with the regular code flow. I see you have geoblocker enabled, maybe start with this one.

otherwise the problem might be related to your setup specifically, maybe your infrastructure, as you're using nginx you might want to double check your setup: https://docs.nextcloud.com/server/latest/admin_manual/installation/nginx.html

da-h commented 2 years ago

Hey there (and everyone who comes across this),

thanks a lot for your suggestions!

Indeed, one of the featured apps has had an considerable impact on the performance. If I am not mistaken it was the featured "Privacy"-App in my case. (Reenabling results in a message that says it makes the server unstable).

Yet, while davfs' access is considerably faster now, the PROPFIND calls are still the limiting factor. For instance, a simple ls still takes about 3 sec.
Is there anything one can do to reduce that time?

I am also happy to look into the code if you help me finding the right place to look.

My minimal app-list to test this is:

Enabled:
  - calendar: 3.4.2
  - cloud_federation_api: 1.7.0
  - contacts: 4.2.0
  - dav: 1.22.0
  - federatedfilesharing: 1.14.0
  - files: 1.19.0
  - files_sharing: 1.16.2
  - files_trashbin: 1.14.0
  - lookup_server_connector: 1.12.0
  - notes: 4.5.0
  - notifications: 2.12.0
  - oauth2: 1.12.0
  - photos: 1.6.0
  - provisioning_api: 1.14.0
  - serverinfo: 1.14.0
  - settings: 1.6.0
  - twofactor_backupcodes: 1.13.0
  - viewer: 1.8.0
  - workflowengine: 2.6.0

More info on my setup:

> $ ./occ status
  - installed: true
  - version: 24.0.4.1
  - versionstring: 24.0.4
  - edition:
  - maintenance: false
  - needsDbUpgrade: false
  - productname: Nextcloud
  - extendedSupport: false
da-h commented 1 year ago

I was able to fix the problem after finding the related issue #33453.

As mentioned there, the authtoken table was full and slowed down authentication. Clearing old and unused entries increased DAV performance significantly in my case.

Just a side note: While I also found the MR #33375 that should fix the problem, nextcloud 24.0.5 which includes that change did not solve that problem on its own, even after triggering cron manually.

jose1711 commented 1 year ago

In my case there were only 14 matching entries in oc_authtoken table, deleting them seems to have no effect whatsoever.

wallzero commented 1 year ago

I'm finding that the first directory listing is fast but any subsequent listing for some period of time is very slow to respond.

mristuccia commented 1 year ago

+1 on this

We've tried to use Nextcloud as a storage backend for an application we're developing for an Enterprise customer. WebDAV access is a necessary integration in such solution, and we faced this terrible slowness of the WebDAV implementation in Nextcloud. I can assure you that even if we frequently empty the oc_authtoken table, the general slowness is still there. Emptying the table helps a bit, but we still get many seconds durations for even storing a single file inside Nextcloud.

Unfortunately, in the end we needed to switch to the community edition of OwnCloud, which is tremendously faster as far as WebDAV is concerned.

I strongly encourage the Nextcloud team to work on this problem, which have been there for many major releases now, and it is still currently there on the 25.0.2 version.

PVince81 commented 1 year ago

@mristuccia we'd need more information on your setup first to find out what exactly the root cause is

as seen in this ticket the root cause can be very different, like third party apps interfering with webdav

in some case we sometimes also saw that having a Galera Cluster improperly configured could cause bottlenecks on tables like oc_authtoken and oc_filecache, depending on setups.

there are some huge setups with millions of files out there which don't suffer from Webdav slowness

mristuccia commented 1 year ago

@PVince81 it is a plain and simple installation as single docker container from the official image in Docker Hub. Last version tried is the 25.0.2, connected to a MySQL database v8.0.4, also as single container from the Docker Hub official repo. No customizations at all. No clusterized installations. WebDAV API used as per official Nextcloud documentation. Already slow after the startup from scratch, without any additional file. Tried also on Kubernetes, same plain installation, same terrible slowness. OwnCloud installed the same way, as single container, plain and simple, connected to the same DB. WebDAV is deadly fast.

PVince81 commented 1 year ago

@J0WI can you confirm whether the docker from https://github.com/nextcloud/docker with 25.0.2 has slower Webdav or if it could be a config issue of sorts ? I didn't find a related issue in that repo.

J0WI commented 1 year ago

slower compared to what?

kpeiruza commented 1 year ago

We're facing the same problem also without much oc_authtokens (300+ for 100+ users, worst times are for a user with just 6 authtokens, others with 26 are quicker).

Just a hint ... we saw that a folder shared with a group with 100+ users could take up to 10s to create a folder inside of it, which a folder shared with a group of just 6 users took 1s or so.

Could this be somehow related?

In our case, the slowest user by far (just sometimes, not always) is the admin user, used to share many folders with specific groups + everyone group.

We have all the APCu, opcache, redis locking and redis distributed memory settings, running on php 8.1, NC 25.0.3 and mariadb:10.5.15-debian-10-r30 single-noded in another box.

NC is running with apache, http2 + php-fpm on a virtual machine.

joshtrichards commented 1 year ago

WebDAV API used as per official Nextcloud documentation.

What WebDAV client are you using?

joshtrichards commented 1 year ago

Looks like each and every PROPFIND takes 3 seconds to run and this is run on all 17 directories in basedir. That explains almost a minute delay needed to show the initial contents. To answer the original question: it looks like NC's answers are slow.

@jose1711 - One thing to try to help isolate this and also an observation:

You can use cURL to perform webdav requests. Please run something like this to see how long PROPFIND's take when cURL executes them against your server (at least with the default properties -- some clients may also request additional properties from PROPFIND requiring even more data to be looked up and sent but let's stick with the basics for now):

time curl -u USERNAME:PASSWORD "http://nc-test.local:3680/remote.php/dav/files/USERNAME/SOME_FOLDER/" -X PROPFIND

This will help tell you how much of the 3s delay is on the server-side versus on the client side (e.g. a slow XML processing implementation on the client side or some other reason to delay). The number reported here will also be a tad slower than a well optimized client that has session state, but that's another story.

As for all those other PROPFIND's --- that looks like davfs2 is recursively requesting properties for all the sub-folders even when you're just trying to look at the parent folder. And, worse, it's doing those requests serially rather than in parallel. That will certainly slow down the user experience!

joshtrichards commented 1 year ago

Just a hint ... we saw that a folder shared with a group with 100+ users could take up to 10s to create a folder inside of it, which a folder shared with a group of just 6 users took 1s or so.

Could this be somehow related?

@kpeiruza: Is this with a davfs2 mount or something else? If so, which version?

theof commented 1 year ago

I spent quite some time investigating a similar issue, where a simple ls in my remote folder would take about 0.5 sec per http request by davfs2. I also got this log for each request:

Sabre\DAV\Exception\NotAuthenticated: No  public access to this resource., No 'Authorization: Basic' header  found. Either the client didn't send one, or the server is  misconfigured, No 'Authorization: Bearer' header found. Either the  client didn't send one, or the server is mis-configured, No  'Authorization: Basic' header found. Either the client didn't send one,  or the server is misconfigured

I found out I had set my password instead of an app password in ~/.davfs2/secrets. Now the log is gone and the time for my ls is very reasonable.

provokateurin commented 1 year ago

Performance with basic auth is always very bad since it needs to do hashing. App password can be directly compared and don't have this performance penalty.

theof commented 1 year ago

I do not know whether it is the same issue for others in this thread. I imagine it is a common misconfiguration for people skimming through the docs. So maybe we could emphasize on the importance of using an app password in the docs (making the blue note red) or if easily feasible notifying users using basic auth together with non web-browser clients ?

jose1711 commented 1 year ago

I found out I had set my password instead of an app password in ~/.davfs2/secrets. Now the log is gone and the time for my ls is very reasonable.

Thank you @theof for sharing this observation! Here's a comparison of execution duration (seconds) between user vs application password on my system. Command User password Application password
mount 9.3 3.28
ls (1st) 86.23 7.58
ls (2nd - cached) 29.4 1.94
ls (3rd - cached) 32.89 1.99
ls (4th - cached) 33.51 2

It's unbelievable what difference this tiny change makes.

I do not know whether it is the same issue for others in this thread. I imagine it is a common misconfiguration for people skimming through the docs.

This may be an issue only if the NC is running on a hardware with weak CPU as I have been using other NC instance w/o application password just fine.

provokateurin commented 1 year ago

I wonder if the performance is so bad because the hashing happens many times because of recursive listing? @jose1711 maybe you could also compare listing folders with different levels of depth?

jose1711 commented 1 year ago

I wonder if the performance is so bad because the hashing happens many times because of recursive listing? @jose1711 maybe you could also compare listing folders with different levels of depth?

Note that ls listing was done on the top level (mountpoint). I can imagine that a recursive listing would take a lot more time to process.