Reduce startup time with vfs cache (writes/full)

hekmon commented 4 years ago

What is your current rclone version (output from `rclone version`)?

rclone v1.53.1

os/arch: linux/amd64
go version: go1.15

What problem are you are trying to solve?

When the new VFS cache has a lot a files, startup can be extremly slow because rclone checks each file (in order to retreive those which need to be uploaded ?)

For example:	Cached files	Cache Size	Startup	Backend Size	Warm time	Total
325109	137.421G	01:20:19	1,12T	00:15:46	01:36:05
701784	276.549G	03:19:04	1,24T	00:22:56	03:42:00

How do you think rclone should be changed to solve that?

I am guessing that the initial scan is sequential given the very low IO. Would it be possible to have a worker pool (size to be determined) in order to walk down the cache directories and analyse the files concurrently ? It would be a big speed up.

ncw commented 4 years ago

At startup rclone reads each metadata file and checks it against the data file to see if it is ok or if it needs expiring.

Running this in parallel would be possible or perhaps running it in the background might be more sensible.

I'm not sure why we do this at the start.

Is rclone blocked for this startup period?

hekmon commented 4 years ago

Yes it is blocked (fortunately actually).

My software is bound to the rclonemount using BindsTo= and After= systemd primitives. Thanks to the systemd notify support in rclone, its launch is correctly delayed until rclone is fully started. But what a delay !

Startup corresponds to the rclone start itself with ExecStart= while warmup is a rc vfs refresh command launch in ExecPostStart=. systemd correctly handles both of them very nicely and the unit only turns ready once both are done protecting the launch of my software until the mount is not 100% ready (this is a good thing).

For now I have limited the cache size to 128G in order to control the startup delay, but I was initially planning of having a 2T cache :O

I guess this is particular use case where the cache has a LOT of small files.

ncw commented 4 years ago

I'm sure we can fix this!

I noticed recently that the number of files seems to be wrong if the startup deletes files. Do you see this too?

hekmon commented 4 years ago

I can not say that I have 🤔 It does not mean it is not there, just that my software immediatly starts IO and I just analyse the logs to find out all the timings afterwards (so number of files have already changed a lot).

Nodens- commented 3 years ago

Today I had to reboot a Fedora server (scheduled reboot for picking up new kernel) that was in the process of uploading via rclone mount, 100GB of backups (average size per file 50-100mb, with a bw limit of 1MB/s to encrypted gdrive backend. The result was that the server boot was very slow as I found out by the "booting" message while trying to ssh into it. Checking the boot log, the systemd unit for rclone was constantly timing out and restarting. I checked the rclone log and I saw this behavior. Rclone was queuing files from vfs cache very slowly and systemd was timing out and trying to restart. I set TimeoutStartSec=infinity in the unit file and systemd stopped restarting the unit BUT the booting process took a bit more than half an hour to finish because of this. The filesystem is btrfs on a hardware RAID5 SSD array on LSI megaraid. This is very slow indeed. I am not sure why this is not handled in the background after the process starts?

Nodens- commented 3 years ago

@hekmon Regarding warmup time you can remove the time it takes for the refresh to finish from starting your rclone unit by not using ExecStartPost= Instead use a type simple precache unit that is wanted by your rclone unit. That allows your unit to start and refresh to happen in the background. Eg:

[Unit]
Description=rclone precache
Requires=rclone.service
After=rclone.service

[Service]
Type=simple

ExecStart=/usr/bin/rclone rc vfs/refresh recursive=true --rc-addr 127.0.0.1:5572

RemainAfterExit=yes
User=user
Group=group

[Install]
WantedBy=rclone.service

hekmon commented 3 years ago

Thank you @Nodens- , unfortunatly on that particular use case, I need the waiting software to not stale during its own startup and as it does scan the directory structure while starting up, warming up between rclone and itself is the only option :)

If you are interested on my systemd approach (warmup script but not only) you can fin everything here: https://github.com/hekmon/rclonemount

Nodens- commented 3 years ago

I see. I use a similar approach to your repo but I use rclone@.service to load configurations from users directly from their home directories instead of a centralized directory under /etc so security is handled per user/by user and mounts run under their own credentials. Any particular reason you use umount -f as root for unmounting instead of fusermount -zu which detaches the mount immediately so all that remains for cleaning up is the rclone process to die, without any waiting or elevation involved?

hekmon commented 3 years ago

I had an experience where the unmount went badly and while everything seemed fine from the fuse side, the mount will still on/busy/stale on the system side. The only thing that worked was the forced umount as root.

Since, I have added it as an extra layer of precaution to be 100% sure the mount point will be freed once the unit is stopped (in case of an auto restart for example). 99,9% of the time the command does nothing as the unmount has been done cleanly (thus using the -).

It could be an issue in your case as your users can edit the configuration and used the destination variable to abuse the umount command. Here, only the admin does it and the rootless was wanted for execution mainly.

Perhaps fusermount -zu would have been enough, I will try it next time I have a bad unmount :)

Nodens- commented 3 years ago

MNT_FORCE is actually dangerous:

MNT_FORCE (since Linux 2.1.116)
              Ask the filesystem to abort pending requests before
              attempting the unmount.  This may allow the unmount to
              complete without waiting for an inaccessible server, but
              could cause data loss.  If, after aborting requests, some
              processes still have active references to the filesystem,
              the unmount will still fail.

While MNT_DETACH:

Perform a lazy unmount: make the mount unavailable for new
              accesses, immediately disconnect the filesystem and all
              filesystems mounted below it from each other and from the
              mount table, and actually perform the unmount when the
              mount ceases to be busy.

So basically it detaches the file system immediately for new accesses and as soon as rclone process dies, hence any possible open descriptors are released, unmount completes cleanly. Have never seen it fail in any of my use cases (eg gocryptfs encrypted home directories etc.) and is safer for data.

hekmon commented 3 years ago

Let's continue the talk here https://github.com/hekmon/rclonemount/issues/1 and keep this issue about slow startup time because of high number of files in the cache ;)

douglasparker commented 2 years ago

I also had to set TimeoutStartSec=infinity due to startup taking a really long time. I initially tried a 30 minute timeout and it was not enough...

Nodens- commented 2 years ago

Yeah this is still an issue. We're currently scheduling reboots only when the cache is empty or almost empty. On hardware RAID6 ssd array we had situations where it took over 2 hours for the systemd unit to complete.

douglasparker commented 2 years ago

Yeah this is still an issue. We're currently scheduling reboots only when the cache is empty or almost empty. On hardware RAID6 ssd array we had situations where it took over 2 hours for the systemd unit to complete.

It ended up being well over an hour for me. I ended up paying for Dropbox Advanced just to avoid the daily upload limit with Google Drive. I'm not interested in a bandwidth limiter as a workaround either.

Nodens- commented 2 years ago

Bandwidth limiter doesn't help, quite the contrary. The problem is large number of files in vfs cache getting enumerated synchronously on startup. Limiting bandwidth == slower upload == bigger vfs cache than unlimited bandwidth == longer startup duration than unlimited bandwidth. The fix is really rclone starting up, doing the cache processing asynchronously in the background. It should also benefit from concurrency there as well.

douglasparker commented 2 years ago

@Nodens- Google Drive has a daily upload limit of 750 GB per day.

My problem is that my 2 Gbps connection quickly downloads and then uploads everything so quick that the daily quota is reached, and then I have to wait 24 hours while the VFS Cache is full before it can upload the remaining files.

If I set a bandwidth limit, I can ensure this doesn't happen. However, that comes at the expense of slow downloads on days that I don't plan to exceed the daily upload limit.

Edit: Ultimately, this problem needs to be addressed head-on, but using a bw limit in my situation would make it so the VFS cache never gets too full.

Nodens- commented 2 years ago

Ah, I see. In your use case you can control the amount of data you have to upload. I didn't think of that. In mine I can't. It's all automated backups and datasets so the cache is always full with thousands of files no matter what. I am actually limiting the upload to 20mbps in order to not saturate the line because it's a production server that gets a lot of traffic so this makes the issue worse. In order to reboot the server and pickup kernel updates I have to stop all uploading automation 2 days earlier in order to get the cache at an acceptable size for reboot. I am aware of the daily drive limit but I'm not hitting that because I'm actually limiting, but even if I did, I'd still limit at that size instead of migrating out of drive because there's no other option for the storage requirement at an acceptable rate for my small time one man (and external associates) software engineering/game dev company (15TB currently on drive and growing daily).

douglasparker commented 2 years ago

I am aware of the daily drive limit but I'm not hitting that because I'm actually limiting, but even if I did, I'd still limit at that size instead of migrating out of drive because there's no other option for the storage requirement at an acceptable rate for my small time one man (and external associates) software engineering/game dev company (15TB currently on drive and growing daily).

Yeah, Google Workspace offers incredible value. So much value that I'm always nervous that unlimited will end up going away. 😅

ncw commented 2 years ago

There are two things that could do with being fixed here

Do the scan of the metadata files in the background
Use a more efficient format for the metadata database (eg a key/value db which we have in rclone already)

Doing them both would give the most bang for the buck - it would make instant startups and it would get the metadata out of memory so use much less memory.

Would anyone (or their company) affected by this issue like to sponsor me to do this work? If so drop me an email to nick@craig-wood.com and we can discuss.

hekmon commented 2 years ago

Thinking about a way of making my company affected by the issue

Just kidding ;)

Only personal projects here. I can not offer much but let's talk about it (check your inbox).

Nodens- commented 2 years ago

Currently operating at a loss as we're working on a game title that's still at least a year before completion so there's no way I can budget anything meaningful unfortunately otherwise I would have posted a bounty for this already. Running a pretty tight ship here (hence relying on google drive and rclone for very important backups.. getting creative to cut down on costs, hoping to not go bankrupt until the game is done up to the crowdfunding stage heh).

maxfield-allison commented 1 year ago

has there been any movement on this? it seems there are a few workarounds but even so a baked in solution would be excellent

greenbrettmichael commented 1 year ago

@ncw are you still looking for a bounty? do you still think your approach will fix the issue since you last looked at this ~ a year ago?

rclone / rclone