nextcloud / desktop

💻 Desktop sync client for Nextcloud
https://nextcloud.com/install/#install-clients
GNU General Public License v2.0
3.04k stars 799 forks source link

Improve speed for initial sync with virtual files #4424

Open wonx opened 2 years ago

wonx commented 2 years ago

How to use GitHub

Feature description

When using virtual files, the first log in after a new installation will start a syncing process that can take a very long time depending on the number of files to synchronize.

In my case, i'm syncing around ~700000 files, my computer has been already up for 29 hours without a restart and the sync process has now reached the 50% mark. I can see that the virtual files are created one by one, but it can be as slow as 2 per second. Two or more days until Nextcloud can be usable is too much in my opinion.

It would be cool if there was any way to speed up the initial sync.

PS: This is related to https://github.com/nextcloud/desktop/issues/4421

marcotrevisan commented 2 years ago

I'm experiencing a similar problem on Mac OS (around 200k files). In my humble opinion, syncrhonizing the full hierarchy is the key problem here. The typical end user doesn't need to have the full folder hierarchy saved and synchronized. A lazier approach (i.e. trigger on open and/or scan the opened subfolders only, and not the whole depth trough but perhaps 1 or 2 levels below) would grant more scalability and decrease the load on the NC server.

johannes-luebke commented 2 years ago

I'd like to add, that restarting the sync, client or PC will result in a complete restart of the process. Also the sync doesn't seem to start immediately, but it first counts all files it will sync and then starts syncing. The counting alone takes two days for me and the sync isn't done after more than 10 days. At least the sync should pick up where it left of.

CWempe commented 2 years ago

I have the same issue.

The most annoying part is that I do not need the folders with all the little files available on my desktop.

So it would be enough if I could say "do not sync this folder unless it is accessed by the user".

I think the suggestion from @marcotrevisan (see https://github.com/nextcloud/desktop/issues/4464) also sounds promising.

PhilippSchlesinger commented 2 years ago

See https://github.com/nextcloud/desktop/issues/4918#issuecomment-1246386007 for a description of a problem with the tray window related to speed issues for inital sync

CWempe commented 1 year ago

I can confirm this issue.

I started syncing virtual files (~1 million) on a new notebook. I knew it would take a while. The next day I checked and saw about 30 % finished. The day after that only 10 % more (= 40%). In the application window I could see that roughly one file was processed per second.

Then I read about restarting the client software here. And the syncing (files per second) increased dramatically.

Now I took some data to verify this behavior:

image

image

So the best workaround would be a script that restarts the Nextcloud client every 30 minutes or so. 😜

Bu it would be great if this could be fixed.

Server: 24.0.7 (docker) Client: 3.6.2 (Windows)

PhilippSchlesinger commented 1 year ago

With latest Nextcloud Client 3.7.3 an inital sync on ~150k files took <1 hour where it was a whole night and endless errors in the past. Maybe you guys could also check again and see if it improved with the latest version.

PhilippSchlesinger commented 1 year ago

With latest Nextcloud Client 3.7.3 an inital sync on ~150k files took <1 hour where it was a whole night and endless errors in the past. Maybe you guys could also check again and see if it improved with the latest version.

@CWempe Since you described the issue in detail and with numbers previously, could you maybe check again with 3.7.3 or later and report if anything changed?

tomdereub commented 1 year ago

Like I said here : https://github.com/nextcloud/desktop/issues/3120#issuecomment-1584621317, I'm still having the problem with Nextcloud 25 and desktop client 3.8.2. In 24 hours it had not yet finished to count files to synchronize, then it lost connexion, and restarted from scrath... About 2 000 000 files.

limatus commented 1 year ago

I can also confirm that this issue persists with 3.9.0 and [Cloud] 26.0.2. For approximately 500k files, the anticipated time jumps between 6 days and “A few seconds” – It “syncs” (virtual files) ruffly 100 files per second. Just for testing purposes, I tried to sync the same load of files with the ownCloud [v4.1.0-rc.2] https://github.com/owncloud/client/tree/v4.1.0-rc.2) Client. This client does the job much faster, approx. 500–700 files per second – same server. It could be my laptop, but at least for the NC client with the other 30–50 laptops I experience the same issue.

hodyroff commented 1 year ago

@limatus Try with ownCloud Infinite Scale, 3.0 just got released, would expect 4x performance compared with oC10,

limatus commented 1 year ago

@hodyroff thank for the hint, but I do not intend to switch servers – the Server was and is from NC!

tobiasKaminsky commented 1 year ago

@claucambra is this a duplicate of [#5692](https://github.com/nextcloud/desktop/issues/5692 or vice vera?

claucambra commented 1 year ago

They are different, this is related to the Windows VFS (normal sync engine) while #5692 is related to the macOS-specific sync engine in the file provider module

allexzander commented 1 year ago

@limatus @CWempe Just to get a bit more context on Virtual Files vs normal sync, do you have a much slower syncing when using Virtual Files when compared to how it syncs via normal sync if you also select to sync everything?

tomdereub commented 1 year ago

@allexzander : The problem is only with the initial sync. I'm using VFS on my personal server with success, it's working well. The problem appears with lots of data, with 500 000 files it takes about a few days to get the initial sync complete. After that syncing seems as quick as with normal sync. Is there any chance to see any progress on this issue ? It has been agreed for 2 years now (https://github.com/nextcloud/desktop/issues/3120#issuecomment-907067592) without visible progress...

limatus commented 1 year ago

@allexzander if I sync the files via normal sync, the bottleneck seems to be the connection speed, which is understandable. Sadly, we mostly use virtual files, as they're simply too many files. It's similar to what @tomdereub mentioned, the initial sync needs days, thereafter, it’s fine.

PhilippSchlesinger commented 1 year ago

@allexzander For the sake of completeness I'd like to add that what @tomdereub and others are describing also happens when a significant amount of files are added to the nextcloud account after the initial sync. So when the nextcloud client needs to sync this newly added amount of files, the client shows the same problem as on the initial sync.

As described by @CWempe in https://github.com/nextcloud/desktop/issues/4424#issuecomment-1341235591, the sync speed decreases dramatically over time. Is this perhaps due to the real-time listing of activities in the tray window for each individual file being synced? If this could be identified as a cause of the slowdown, then perhaps lazyloading activities or even summary listing for large numbers of files would be an option.

tomdereub commented 1 year ago

@allexzander : The problem is only with the initial sync. I'm using VFS on my personal server with success, it's working well. The problem appears with lots of data, with 500 000 files it takes about a few days to get the initial sync complete. After that syncing seems as quick as with normal sync. Is there any chance to see any progress on this issue ? It has been agreed for 2 years now (#3120 (comment)) without visible progress...

Like said by @PhilippSchlesinger, after some time using VFS on that folder with about 500 000 files, I find it too bad to keep syncing the whole folder tree. Every time somebody modifies quite a lot of files, it starts a long sync. It seems to me impossible to deploy for 30 persons, it will charge a lot the server and each computer. From my point of view, the right way to make it scalable is to sync only folders that has been accessed at least one time. I mean :

Is this technically possible ? And if yes, what do you (nextcloud devs) think about it ? It seems to me that it's the actual behaviour of the android desktop client.

marcotrevisan commented 1 year ago

I'd like to add that under Mac OS things are changing towards a FileProvider based implementation, which will solve the issue by delegating a good part of the sync logic to MacOS.

IMHO, if under Windows there's no API like FileProvider, then the client should evolve itself to a lazier approach... a "full sync" approach is against scalability and in the long run it's a major limiting factor for a borader adoption of Nextcloud. In the case of 500k files and 30 users that are actively working, push notifications tend to generate very frequent peaks of PROPFIND requests coming from all the clients. Such peaks will cause slowdowns not only to the clients themselves but also to the other apps (talk, mail, calendar, deck...), and the end result is a busy server instance that actually is not doing anything except triggering propfinds and responding to propfinds, for files/folders that are often far away from where the actual users are working. That's why in my hubmle opinion this is a critical and high-priority issue.

marcotrevisan commented 1 year ago

@tomdereub I'm in a very similar situation to yours and as a mitigation solution I ended up as follows:

In this way, server load is under control (push notifications won't wake up all clients every time) and the clients are snappy enough to work. The advantage is that, for heavily used folders, the NC client has all the files downloaded and ready; the disadvantage is that not all the users are comfortable with such setup.

Hope it helps

tomdereub commented 1 year ago

@marcotrevisan I'm actually trying mountainduck, and it seems to do everything I want with the "smart synchronization" mode. There is an option to index files or not. So without checking this option, it will not index all files, it will just keep index of visited folders. And there is a option to keep a folder offline on local disk. So it actually does what nextcloud vfs does, but with 2 advantages (from my point of view) :

marcotrevisan commented 1 year ago

Yes, but don't get drunk too fast, it has its own bugs (in Mac OS at least) :-D Avoid unzipping archives in the share for example. Sometimes it'll screw things up, and I don't know why. The safest mode in my experience is the Online mode. If you're in Windows it may behave differently.

roberix commented 1 year ago

Yes, but don't get drunk too fast, it has its own bugs (in Mac OS at least) :-D Avoid unzipping archives in the share for example. Sometimes it'll screw things up, and I don't know why. The safest mode in my experience is the Online mode. If you're in Windows it may behave differently.

Hi. I can confirm this. We have tested extensively the "Duck" on Windows and while the client does very well in terms of performance there are many other issues around file locking, online detection, working with MS office and so forth.

Is there any progress to be expected on improving the initial VFS sync speed? We are migrating at the moment a lot of files to NC and I am already afraid from starting the sync on our clients.

At the moment the inital sync with about 100K files takes about 60 minutes.

Regards

Rob

PhilippSchlesinger commented 1 year ago

Just small addition regarding the initial scan: Synchronizing placeholder files for an additional 100k files is expected to take 0 seconds (after a previous operation already took over 90 minutes for 60k files):

Screenshot 2023-10-17 101406
tomdereub commented 11 months ago

It has been agreed for 2 years now (#3120 (comment)) without visible progress...

@allexzander @mgallien could you please just give us some idea of the priority of this issue and the ways to solve it ? Like "it's not the priority at the moment, so we don't know when it will be worked on", or "it's very complicated to solve, we have to re-write entirely the sync engine, so it will take some time before we can work on it", or "you're just a few users concerned, so it's not a priority, most of our users don't have so much data"...

As users, we need to know if there is some chance to get VFS scalable at a short or mid term, or if we have to found other solutions. I don't want to see my company giving up with nextcloud and other opensource software we're using, and fall into full microsoft solutions. I'm trying for some time mountainduck as an alternative, but as @marcotrevisan and @roberix have said, for some cases it's not working as well as nextcloud desktop client. So I need to know a bit more of nextcloud desktop client futur development before deploying it for all users.

tomdereub commented 9 months ago

@joshtrichards : you added a label on this issue, what does that mean ? Will somebody start working on it ?

OpsecPGR commented 7 months ago

@joshtrichards : you added a label on this issue, what does that mean ? Will somebody start working on it ?

From what I can see, looks like they began working on this about a week ago.

tomdereub commented 6 months ago

This https://github.com/nextcloud/desktop/pull/6461 is exactly what is needed for windows too.

PhilippSchlesinger commented 6 months ago

Dear Nextcloud developers, @allexzander It would be great if you could shed some light on what is actually being worked on. Many are following this bug and many of us contributed to this issue.

See https://github.com/nextcloud/desktop/issues/4918 for a description of a performance problem (PR intended to solve the problem in https://github.com/nextcloud/desktop/pull/5941) with the tray window. Solving this heavy issue could also pay off in improving the speed problems with initial sync.

psxvoid commented 1 month ago

For me, the initial sync is in progress for several days, and seems like laptop restarts, network connection issues are restarting this process from scratch each time. On the screenshot the number of total files is constantly increasing (~1-5 items per second), and notice, file synced count is always 0:

image

and there are no any files in the sync folder except those (and the size of sync.db is NOT changing as well):

image

It seems completely unusable at this point.

P.S.: Client: Nextcloud-3.14.1-x64 for Windows Server: Nextcloud 29 on Docker (the server is quite slow running on Raspberry Pi 4) Files Total: > 300 000

tomdereub commented 1 month ago

This first step of initial sync is very hard on the server. You can have a look of cpu consumption of your server, I think it's the bottleneck : in my case I have an intel i5-10210U, 6 cores dedicated to my server, and it's using almost 100% of all cores while doing this first scan of all files. I have about 700 000 files, and it takes between 1/2h and 1h to make the scan. So I'm not surprised that it takes so long on a RPi. Once the server side scan is finished, it takes up to 48h non stop on the client to create the whole file tree. In my case, once the first sync is done, it's working well (20 persons using it), and the load on the server is ok. Looking forward to some improvement on this issue...

ne0YT commented 1 week ago

@Rello hey there, do you have an estimation when this will be done ?

we are planning to move from our weird software-solution built on top of windows builtin webdav which has a lot of other issues and officially was already canceled (still available but not getting updates they say).. so a switch will be needed as fast as possible.