Closed Baltix closed 4 years ago
cc @rullzer @icewind1991
CREATE INDEX properties_path_index ON oc_properties USING btree (userid, propertypath); (this is PostgreSQL syntax)
Please add this index in Nexcloud database creation file.
@nickvergessen Makes sense or not?
Sounds familiar
@rullzer Another one where an index could help a lot
We had a look at this and adding the index there is not that easy as it might work fine in Postgres but for mysql there is some limitation on the index length in default setups. :/
So we will move this to 15 as there is not enough time left to look into this properly.
I have similar setup on my Ubuntu16.04 server (I use MariaDB) and I also mounted webdav ono my client. ls
command takes ~60-120 seconds. cp
(10 files a 300k) takes more than 10 minutes.
Same issue here.
I actually intend to use the personal nextcloud server as a backup storage. Unfortunately, it's very slow. :-/
Is there some update on this one?
👍 for an update on this please. Transfers via webdav are incredibly slow. 5mins for 300k.
Extremely slow WebDav confirmed also from my side
Something I've noticed is that it runs the "getScanner" method on the OC\Files\View class every file every time you do anything with the local storage. Even when I'm 3 or 4 folders deep, if I reload that page, it scans every folder from the root up to where I'm at.
I've been trying to profile the folder loading (trying to figure out which function is the culprit), and it looks like if I'm at a location like this
/Local Storage Root/Brian Leishman/Pictures/Food
Then it scans every file/folder in "Local Storage Root", then "Brian Leishman", "Pictures", and finally "Food", so when "Local Storage Root" has 11k folders in it, they all get scanned again at this point, which means that accessing that last folder with 1 item in it takes about a minute to load (even on a very, very fast machine with all the performance tuning enabled).
I'm trying to fix this myself for our use case, but I'm getting stuck trying to figure out if it's the Sabre/DAV 3rd party tool or NextCloud itself
I have folders with 100k+ files and even if I mount locally with davfs2 I get very poor performance. this is pretty much a nearly killer for my installation. any advice so far. I got redis running and every possible tuning paramter i could find. I am running 16.0.1
@ybaumy I'm afraid as long as enterprise customers do not complain, nothing will happen here. While this seems to be a core issue (one of the primary functions of Nextcloud), the priority seems disproportionally low. But every time I mention that priorities are off in this project, I get my head ripped off, so I'm just going to shut up and be quiet for another few years.
@tessus
are there any useful alternatives to nextcloud + webdav?
None with that many plugins and apps around.. AFAIK
Same issue transfer of some 700 MB took around 30 mins :( anyone have alternate ?
@mantri-govind I think there is not much that can be down, if you are looking for webdav solution, besides wait for the nextcloud guys to optimize their php code at least for listing directory content. anyways the poorly written php code + webdav is a performance killer. maybe someone who made a fork of nextcloud has addressed this issue. I wonder how Enterprise customers can accept this. I mean if you leave the community hanging somewhere it is, well not so good but understandable. if you find a fork please reply.. I had to turn off encryption completely and I am now accessing the files with other protocols.
@ybaumy sad.. actually I found this issue been also reported with davfs2 package as well https://bugs.launchpad.net/ubuntu/+source/davfs2/+bug/185986 as you have mentioned not sure why would they leave issue open for so long! Coming to encryption part.. I don't have backend encryption set. Also can you pls let me know what is the other protocol we can use to configure nextcloud webdav share as mount folder !
Right now I'm not planning to fork it may be I'll go with slightly expensive approach of using rsyc
and occ files:scan -all
kind of solution with cron
:(
PS : If you copy archive file one by one performance is quite manageable (700 MB zip took around 1 min or so).
Same problem, we need to mount the admin folder via davfs2 (fstab) and ls is extremly slow @ 800 000 files / folders. Is there any solution, already tried so much tweaks.
I think it's always searching recurvsive, pretty nonsense if i make a ls for showing the dirs in the pwd.
I created indexes as per @Baltix 's post and the speed improvement was just over 100x (1m48s to 1s) - but this is over a network and involves caching. A locally mounted share now returns from ls
after 400ms. The database has ~ 700k files in oc_filecache.
Additionally, after the above, I edited apps/dav/lib/Connector/Sabre/CustomPropertiesBackend.php
and nextcloud/apps/dav/lib/DAV/CustomPropertiesBackend.php
by inserting return [];
in the first line of the getProperties functions. This appears to be where the above problematic query originates.
I have also tried to tune the postgres instance (docker-compose format):
command: -c 'shared_preload_libraries="pg_stat_statements"'
-c 'track_activity_query_size=2048' -c 'pg_stat_statements.track=all'
-c 'shared_buffers=8GB' -c 'effective_cache_size=24GB'
-c 'maintenance_work_mem=2GB' -c 'checkpoint_completion_target=0.7'
-c 'wal_buffers=16MB' -c 'default_statistics_target=100'
-c 'random_page_cost=1.1' -c 'effective_io_concurrency=200'
-c 'work_mem=10485kB' -c 'min_wal_size=1GB'
-c 'max_wal_size=2GB' -c 'max_worker_processes=8'
-c 'max_parallel_workers_per_gather=4' -c 'max_parallel_workers=8'
Hopefully this helps someone :)
@aquarat unfortunately the database changes only work on Postgres. People with a MySQL setup are not able to apply those changes. Acoording to @MorrisJobke that was moved to 15, and now we are on 17 and the same problem still exists.
Changing the getProperties
function will most likely break Nextcloud, so I doubt that this is a valid solution.
Following up on @mantri-govind: Chances are that there is a problem in the linux davfs2 implementation. In the bug report that @mantri-govind mentions, there are two posts (#31 and #33) mentioning that the files in the same system can be accessed significantly faster using the webdav:// or dav:// protocols in KDE's konqueror and Nautilus, respectively. I can confirm this for Konqueror. It is unfortunate that nothing has been done on the davfs2 side to look into this. I am not qualified to do so.
@schniepp I think solution given by @aquarat might speed up a bit. Also it should work fine even on remote machine with Konqueror/Nautilus, nextcloud client itself uses webdav. Problem start the movement we mount it as share.
@mantri-govind: Thanks for the feedback! My setup currently uses mysql. From what I read above, the solution will only work for postgresql, correct?
@schniepp Modifying the code will work for any database but creating indexes is database-specific. Modifying the code isn't great because it'll get overwritten during an update. Creating indexes is the best option.
Also this speed improvement works using both davfs2 on Linux (Ubuntu) and Mac OS X Finder (the primary target of my efforts) - so it isn't a davfs2 issue.
@tessus So far everything is working well with the early empty return in getProperties. There is no indication of failure or malfunction. The system in question has over 20 users using it at any time during the day.
Sorry for Off-Topic: How do you add entries to oc_properties? Is this only possible via davfs2? I tried to set the executable bit but the nextcloud client (for linux) does not push anything? :see_no_evil:
@aquarat: so how about @tessus' comment that inserting the return [];
command will break nextcloud? Any idea how dangerous it is to do this? I did a quick test and did not see huge improvements. Not sure if a restart would be needed to do this, or if the old version of the php file is cached somewhere. But mounting the same folder via sshfs instead of davfs was still at least an order of magnitude faster. (Same server, same network connection, same client.) Just some very rough ball parking here.
So I installed pghero on our Nextcloud system and it had a number of suggestions for indexes to add (and remove).
For example on the following query
SELECT "fileid", "storage", "path", "path_hash", "parent", "name", "mimetype", "mimepart", "size", "mtime", "storage_mtime", "encrypted", "etag", "permissions", "checksum" FROM "oc_filecache" WHERE "storage" = $1 AND "path_hash" = $2
create the following index
CREATE INDEX CONCURRENTLY ON oc_filecache (path_hash)
It also listed a large number of duplicate indexses:
On oc_appconfig
appconfig_appid_key (appid)
is covered by
oc_appconfig_pkey (appid, configkey)
On oc_collres_accesscache
collres_user_coll (user_id, collection_id)
is covered by
collres_unique_user (user_id, collection_id, resource_type, resource_id)
On oc_credentials
credentials_user (user)
is covered by
oc_credentials_pkey (user, identifier)
pghero also listed 50 unused indices that could be removed for faster writes to the database.
fs_storage_mimetype UNUSED
and fs_storage_mimepart UNUSED
are two examples.
I haven't removed any of these unused indices because I didn't want to break anything but I have removed duplicate indexes are created the suggested ones. So far I've seem some improvement in the overall speed of Nextcloud when dealing with directories containing 100's of small files (<1MB). But this was all subjective, I haven't done an official speed test.
@bmorgenthaler indexes can be used for a number of reasons, e.g. as a constraint to ensure uniqueness. Without knowledge about the app and why those indexes exist, it is hard to determine anything. As for duplicate indexes it depends on the engine. Some rdbms can use a subset of a compound index (when at the beginning), some others cannot. Same is true for the order. Maybe it's a side effect of having an abstraction layer that creates the same set of tables/indexes for every rdbms supported by Nextcloud.
Exactly this problem still in 17.0.1 with MySQL. Dav client, Cyberduck on MacOS. The queries are slow and they build up with each action in the client.
Exactly this problem still in 17.0.1 with MySQL. Dav client, Cyberduck on MacOS. The queries are slow and they build up with each action in the client.
Guys. I just wanted to give you an update on my solution. I switched to minio for file storage. Since I never used much of the other nextcloud features, this switch did not really hurt. Video and voice is now jitsi at the moment at least. minio has no gui client, but I don't need that for synchronization. The speed is amazing and I cannot complain. If you are already familiar with AWS S3 storage then the switch will be easy.
Guys. I just wanted to give you an update on my solution. I switched to minio for file storage. Since I never used much of the other nextcloud features, this switch did not really hurt. Video and voice is now jitsi at the moment at least. minio has no gui client, but I don't need that for synchronization. The speed is amazing and I cannot complain. If you are already familiar with AWS S3 storage then the switch will be easy.
No solution for us.
@ybaumy How did u manage to mount minio
on your machine ?
@mantri-govind use s3fs. create a file with your credentials.
/etc/s3creds with content of user:password
then mount with
s3fs <bucket> /s3 -o passwd_file=/etc/s3creds,use_path_request_style,url=https://miniohost.domain,allow_other,umask=0007,uid=1000
I have specified uid=1000 for my user that should have access to the files. more information can be found in the manpage
Following up on @mantri-govind: Chances are that there is a problem in the linux davfs2 implementation. In the bug report that @mantri-govind mentions, there are two posts (#31 and #33) mentioning that the files in the same system can be accessed significantly faster using the webdav:// or dav:// protocols in KDE's konqueror and Nautilus, respectively. I can confirm this for Konqueror. It is unfortunate that nothing has been done on the davfs2 side to look into this. I am not qualified to do so.
I can 100% confirm that
Exactly this problem still in 17.0.1 with MySQL. Dav client, Cyberduck on MacOS. The queries are slow and they build up with each action in the client.
We have documented this in our wiki.
Exactly this problem still in 17.0.1 with MySQL. Dav client, Cyberduck on MacOS. The queries are slow and they build up with each action in the client.
We have documented this in our wiki.
Could index be included in nextcloud by default?
Large directory 'ls' when using WebDav (davfs2) on Linux results in specific query for each file, like this:
SELECT * FROM oc_properties
WHERE userid
= 'yossi' AND propertypath
= 'files/yossi/Desktop-PC/Documents/Samsung-Backup/Camera/IMG_0025.JPG' AND propertyname
in ('{DAV:}creationdate', '{http://apache.org/dav/props/}executable')
now - I have about 5K of these queries, just for file listing with 'ls'. There must be some better way of handling this data. I do not know what it is, but the current method is too slow.
Edit: The only (partial) workaround I have, which does not scale, is to increase PHP max execution time. It was 60 seconds, and then increased to 300, and most directory listings worked fine, but then there is another one which does not, and I have just tried with 600 seconds timeout. This solution doesn't scale...
Edit2: Did not solve the problem. Web interface works fast, BTW on the same directory - 2-3 seconds wait, and that's it.
Edit3: I have updated davfs2 on Ubuntu (compiled 1.5.5), for no avail. Increased davfs2 timeout from 300 to 600. Doesn't look bright at all. Seems like the DB query mentioned above is performed for each file in this directory (~5300 of them) at a rate of 15+- queries per second - so I was able to calculate that a result would be provided withing ~5.5 minutes. However, it seems as if the DB is running the same queries again, for the same files!
Edit4: It worked. Took 13.5 minutes for 5332 files listing. Each DB query was taken twice. I think that the DB should have an entry for each directory (mark it by a hush of its ${USER_ROOT} relative path, or something), and this hash should be the key for all files inside this directory (and other directories as well), each holding the webdav metadata. It should have been a very fast query, but it became a very slow one.
I use both suggested workarounds:
@KekcuHa — what kind of improvements are you seeing? Someone above suggested that this only works for Postgres, with reservations about MySQL. This is way above my head, so I am hesitant to do this on my production system as I do not want to mess up my DB. If I add the index, could it be removed again without doing any damage?
@schniepp — I noticed a significant acceleration for directories browsing and files operations performed at mounted with davfs2 remote resources. Index does not affect the data stored in the table, it just speeds up query time, so it can be safely removed.
Wow! Only the indexes (MySQL) resulted in a huge performance and response change! During 'rsync' operations (The preferred backup method, since the Linux agent kills files with weird names) the server's CPU dropped from 1-1.5 cores (100-150% CPU) to about 3%, Apache CPU has gone up (this is good - it means it is used to transfer data), and the speed change is amazing. Wow. I really thing this change should be incorporated into NextCloud. Amazing change, at a zero cost.
I really thing this change should be incorporated into NextCloud. Amazing change, at a zero cost.
One question remains if this will help on all databases or at least if it doesn't worsen the speed on other databases...
I just implemented the MySQL indices following @KekcuHa 's recommendation, and I changed the nextcloud code following @aquarat . However, the KDE implementation of webdavs:// is still lightyears faster when accessing the very same directories (same server, same client). I didn't test this quantitatively, but I want to say the difference is at least one order of magnitude.
I am not sure if the changes made a difference. I restarted MySQL and apache2 services. Does anything else need to be restarted for these changes to go into effect? Perhaps, the recommended changes only make a big difference for directories containing many files and folders? In my case, there are, perhaps, only 10-20 files and folders in a typical directory, and I still see significant delays.
If it is as simple as adding the following SQL index for MySQL:
CREATE INDEX properties_path_index ON oc_properties(userid, propertypath)
as explained here: https://trac.cyberduck.io/wiki/help/en/howto/nextcloud#Slowlistingsforlargefolderstructures
How many years do we need to wait to get this implemented? This issue is opened since 2 years and WebDAV like that is unusable: super slow on the client side and using a lot of resources on the server side for nothing.
Fixed via https://github.com/nextcloud/server/pull/20716.
Sorry for the delay people!
Access (copy, ls, etc) files from folders with thousands of files through webdav is very slow (10-20 or more seconds for simple
ls
command :time ls /home/sketis/docs.mini-maxi.lt/2_Nuotrauku_katalogas/imported/RB00097504_1_Joico-Color-Endure-Conditioner-1000-ml.jpg
/home/sketis/docs.mini-maxi.lt/2_Nuotrauku_katalogas/imported/RB00097504_1_Joico-Color-Endure-Conditioner-1000-ml.jpg real 0m18.519s user 0m0.000s sys 0m0.000s(/home/sketis/docs.mini-maxi.lt/ is mounted through davfs (davfs2 package), folder /home/sketis/docs.mini-maxi.lt/2_Nuotrauku_katalogas/imported/ contains ~9000 files size 100-600kb)
I've tried lots of optimizations, including redis and database buffer, etc optimizations and almost nothing helps here, but after 3 days of hard work I've found where is the main issue:
every operation with any file from folder with thousands of files through webdav protocol generates thousands queries (one or 2 query for every file from that folder): SELECT * FROM "oc_properties" WHERE "userid" = $1 AND "propertypath" = $2 AND "propertyname" in ($3) I've noticed this with "pg_activity" tool. This query takes about 2-3 miliseconds, but for every operations nextcloud generates thousands of queries :(
This is a bug, because I access (for example copy) only one file, not all files from folder. Operation takes 2x longer, when I try to access file, which doesn't exists:
time ls /home/sketis/docs.mini-maxi.lt/2_Nuotrauku_katalogas/imported/B00002
ls: cannot access '/home/sketis/docs.mini-maxi.lt/2_Nuotrauku_katalogas/imported/B00002': No such file or directory real 0m45.274s user 0m0.000s sys 0m0.000sI've activated "pg_stat_statements" module in PostgreSQL and then found the main issue:
SELECT query,total_time,max_time,min_time,calls FROM pg_stat_statements ORDER BY calls DESC fetch first 2 rows only;
_query | total_time |max_time|min_time| calls ----------------------------------------------------------------------------------------+------------+--------+--------+------ SELECT * FROM ocproperties WHERE "userid"=$1 AND "propertypath"=$2 AND "propertyname" in($3) | 107636.46 | 3.54 | 1.48 |6753267532 queries just after ~2 minutes! "oc_properties" table contains ~12000 rows. I've found, that this table only contains info about executable bit of file, so, it's not useful for majority of users!
So, there is one simple workaround - I've increased query speed 10 times by creating and index in oc_properties table for both columns - userid and propertypath:
CREATE INDEX properties_path_index ON oc_properties USING btree (userid, propertypath); (this is PostgreSQL syntax)
Please add this index in Nexcloud database creation file.
But index creation only partially solves the performance issues with lots of files in one folder - it would be wise to create the setting "Ignore executable bit in WEBDAV" in nextcloud's config.php and don't query oc_properties table at all when this setting is set to true.
Steps to reproduce
Expected behaviour
ls, cp or rm one or several files from folder should take 1 or 2 seconds
Actual behaviour
ls, cp or rm one or several files from folder takes 10-40 seconds
Server configuration
Operating system: Ubuntu 16.04.4 64-bit
Web server: apache (also tested with nginx - almost identical performance)
Database: PostgreSQL
PHP version: 7.0.28
Nextcloud version: 13.0.1 (also tested with all 12.x versions)
Updated from an older Nextcloud/ownCloud or fresh install: Updated from 12.x
Where did you install Nextcloud from: from nextcloud.org downloads
Nextcloud configuration:
Config report
'ocy77dpvulpn', 'passwordsalt' => 'good_password', 'secret' => 'no_secrets', 'trusted_domains' => array ( 0 => '192.168.199.53', ), 'datadirectory' => '/srv/nextcloud-duomenys', 'overwrite.cli.url' => 'https://192.168.199.53/', 'htaccess.RewriteBase' => '/', 'dbtype' => 'pgsql', 'version' => '13.0.1.1', 'dbname' => 'nextcloud', 'dbhost' => 'localhost', 'dbport' => '', 'dbtableprefix' => 'oc_', 'dbuser' => 'nextuser', 'dbpassword' => 'db-password', 'installed' => true, 'mail_smtpmode' => 'smtp', 'mail_smtpauthtype' => 'LOGIN', 'mail_smtpsecure' => 'ssl', 'mail_from_address' => 'docs', 'mail_domain' => 'my.mail.lt', 'mail_smtpauth' => 1, 'mail_smtphost' => 'my.mail..lt', 'mail_smtpport' => '465', 'mail_smtpname' => 'docs@my.mail.lt', 'mail_smtppassword' => 'mail_password', 'mail_smtpdebug' => true, 'auth.bruteforce.protection.enabled' => false, 'maintenance' => false, 'theme' => '', 'loglevel' => 2, 'memcache.local' => '\\OC\\Memcache\\APCu', 'redis' => array ( 'host' => 'localhost', 'port' => 6379, ), );Are you using external storage, if yes which one: local
Are you using encryption: no
Are you using an external user-backend, if yes which one: no
Client configuration
Operating system: Ubuntu 16.04 64-bit