Open vmsh0 opened 2 years ago
You can already do that with ionice commands to really throttle it down.
Feels like an awful solution though as just get the disk space you need to perform a backup.
I think @leoluan1 had a prototype for this?
As was said in the thread, it would be really easy to deadlock the filesystem doing this, so it is going to need careful surgery!
I'm also interested in this myself as a Windows user.
I recently started researching cloud backup solutions for my home PC. That research has led me to using Rclone as a way to allow my existing backup software (Macrium Reflect) to utilize cloud storage. I want to be able to write disk images directly into the cloud storage.
The difference between VFS writes and upload speeds is the single big stumbling block I've run into.
Searching earlier today did lead me to that same discussion thread where @leoluan1 mentioned adding exponential backoff code to item.WriteAt. I went to look at the code to see myself what it was doing but I was disappointed to find that this was never actually introduced?
@Inverness - that's why it has the help wanted tag as no one has helped to pick it up :)
I'm also facing the same situation and considering alternatives for the problem. If I can have the proper guidance I'm willing to do the changes in the code myself and support this feature moving forward. I really appreciate all the work put into rclone and is an amazing piece of software. I would like to initiate a discussion on possible solutions for the problem as I'm not familiar with rclone inner architecture and code base, probably I got some wrong ideas, please correct me if I'm wrong. Please give me your thoughts on the comments below:
As a user I would like the cache to work as a dedicated space to cache locally (similar to a cache in a Raid), not exceeding its size and in the worst case falling back to relying on the remote for the writes. The system must be able to use the free cache if available to guaranteed a better performance, but not failing (IO error in the user space) in case the cache is completely full neither proceed filling up the whole disk.
For the first part I can achieve it by tuning the parameters like --vfs-cache-max-age
to a really big number so the cache is filled but as I described later on this creates some inconsistencies with the idea and how to invalidate files.
The second part is harder given that massive writes fill the filesystem beyond the cache limit. An alternative is to copy
directly the files to the remote but I see this as a workaround the actual issue and expectations.
One idea would be to add a parameter or cache mode, which strictly enforces the cache size limit and throttle the operations, something along the line of --vfs-cache-mode full-strict
. Another idea on how to achieve this is adding a parameter as a threshold to initiate a throttle (something along the lines of --vfs-cache-throttle-threshold
and --vfs-cache-throttle-bwlimit
), where one of them set the trigger for the limit to be applied and the second sets the throttling speed. This would ensure that rclone is not considered hanging or frozen on MacOS and Windows since the writes still being performed at a much reduced speed but enough to satisfy the operating system. The parameter should be set to something much lower than the actual transfer rate to the remote unless it will fill up the disk anyways.
As one example of the above:
rclone mount s3:/test /mnt \
--vfs-cache-throttle-threshold '90%' \
--vfs-cache-throttle-bwlimit 100kbps \
--vfs-cache-max-size 10G \
--vfs-cache-mode full-strict \
--cache-dir /var/cache/rclone
Explaining the intention of the command above: Once the usage of cache is 9gb (90%), the write speed must be decreased to 100kbps.
Some caveats:
I'm not familiar with rclone codebase, but digging around I found these two methods:
Adding a third method which would check the threshold (haveQuotaThreshold
) and a method for checking if is within the range thresholdOK
could be used as the base to create the validations required.
I could trace the method below to perform the writes (maybe I missed something?):
as a pseudo code before executing the write:
if haveQuotaThreshold() && thresholdOK() {
proceed...
} else if {
tryToFreeSomeCache() // Check problem [1]
if thresholdOK() { // If some cache could be freed, proceed normally
proceed... // See problem [2]
} else {
rateLimit.Lock() // Some shared object to guarantee consistency between threads
writeSize := ...
delay := writeSize / thresholdLimit // This write should take how many seconds?
... proceed with normal write ...
sleep(delay)
rateLimit.Unlock()
}
}
Problems:
--vfs-cache-max-age
is bypassed as an invalidation mechanism to free objects to give space to store writes, or it shouldn't be used in this mode. Maybe a better idea is for in this mode to consider that the cache should be full at all times and objects invalidated based on some other rules. I found the cleaner
, clean
, purgeOverQuota
and RemoveNotInUse
methods, but seems like it respect the above.
The associated forum post URL from
https://forum.rclone.org
https://forum.rclone.org/t/rate-limiting-the-vfs-cache-speed-to-prevent-the-local-disk-from-filling-up/23319
What is your current rclone version (output from
rclone version
)?➜ rclone --version rclone v1.57.0
What problem are you are trying to solve?
I'm using a cloud drive mounted with
rclone
as the target for a backup software (namely Borg). While doing the first full backup, Borg generates an amount of data which is much greater than the disk space I have. Thus, the backup cannot complete, because I run out of space.How do you think rclone should be changed to solve that?
I think
rclone
should implement the feature that was described on the forum last year. I.e., have a VFS cache mode flag that allows to throttle requests to the FUSE filesystem when the VFS cache is full.How to use GitHub