rclone / rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
https://rclone.org
MIT License
45.07k stars 4.03k forks source link

VFS cache for B2 is invalidated too easily #7364

Open maxim opened 9 months ago

maxim commented 9 months ago

The associated forum post URL from https://forum.rclone.org

https://forum.rclone.org/t/multiple-cache-dir-subdirs-for-the-same-remote/42044/14

What is the problem you are having with rclone?

I run a B2 --vfs-cache-mode full mount. All mount settings are listed as command line arguments, because I prefer to keep rclone config file limited to just credentials.

Then I needed to tune some settings on my mount, namely these:

--transfers
--buffer-size
--b2-upload-concurrency
--multi-thread-streams

When I restarted the mount, rclone created a new cache directory because it generated a new B2 canonical name. This unfortunately invalidated the existing cache.

Silently invalidating the cache leads to 3 problems:

  1. For large caches (in my case 1.4TB) this could lead to a lot of new network usage.
  2. Files that were written to cache and pending upload are now abandoned in the old cache. They will never upload.
  3. Since old cache doesn't go away, your disk will start filling up beyond the allotted cache size, and that's how you will probably notice that something's wrong.
What to do?

According to ncw's comment, this wouldn't happen if those settings were declared in the config file.

Based on this, I propose 2 actions:

  1. Remove the distinction between settings provided in a config file, command line, or ENV. The only thing that should matter is which settings invalidate cache (its name and value).
  2. Document which settings invalidate cache.

What is your rclone version (output from rclone version)

rclone v1.65.0-beta.7413.87aa029f5.fix-7350-multithread-memory
- os/version: ubuntu 22.04 (64 bit)
- os/kernel: 5.15.0-86-generic (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.1
- go/linking: static
- go/tags: none

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

Ubuntu 22.04.3 LTS, 64 bit

Which cloud storage system are you using? (e.g. Google Drive)

B2

The command you were trying to run (e.g. rclone copy /tmp remote:tmp)

/usr/bin/rclone mount B2:Bucket/path /path/to/mount \
  --allow-other \
  --b2-chunk-size 50M \
  --b2-upload-concurrency 20 \
  --multi-thread-streams 20 \
  --b2-hard-delete \
  --buffer-size 50M \
  --bwlimit "07:00,1M:off 23:45,off" \
  --cache-dir /home/htpc/RCloneCache \
  --config /home/htpc/.config/rclone/rclone.conf \
  --dir-cache-time 87600h \
  --disable-http2 \
  --fast-list \
  --log-level DEBUG \
  --poll-interval 0 \
  --transfers 2 \
  --use-mmap \
  --vfs-cache-max-age 8760h \
  --vfs-cache-max-size 1400G \
  --vfs-cache-mode full \
  --vfs-write-back 15m \
  --vfs-read-ahead 200M \
  --vfs-read-chunk-size-limit 500M \
  --rc \
  --rc-no-auth

A log from the command with the -vv flag (e.g. output from rclone -vv copy /tmp remote:tmp)

Sep 25 23:35:58 htpc rclone[164069]: INFO  : vfs cache: cleaned: objects 2011 (was 2011) in use 94, to upload 74, uploading 20, total size 63.731Gi (was 63.731Gi)
Sep 25 23:36:38 htpc systemd[1]: htpc-mount.service: Failed with result 'timeout'.
Sep 25 23:36:38 htpc systemd[1]: Stopped B2 HTPC Mount.
Sep 25 23:36:38 htpc systemd[1]: htpc-mount.service: Consumed 46min 3.574s CPU time.
Sep 25 23:37:16 htpc systemd[1]: Starting B2 HTPC Mount...
Sep 25 23:37:16 htpc rclone[252704]: INFO  : Starting bandwidth limiter at 1Mi:off Byte/s
Sep 25 23:37:16 htpc rclone[252704]: DEBUG : rclone: systemd logging support activated
Sep 25 23:37:16 htpc rclone[252704]: NOTICE: Serving remote control on http://127.0.0.1:5572/
Sep 25 23:37:16 htpc rclone[252704]: NOTICE: --fast-list does nothing on a mount
Sep 25 23:37:16 htpc rclone[252704]: DEBUG : Creating backend with remote "B2:Bucket/path"
Sep 25 23:37:16 htpc rclone[252704]: DEBUG : Using config file from "/home/htpc/.config/rclone/rclone.conf"
Sep 25 23:37:16 htpc rclone[252704]: DEBUG : B2: detected overridden config - adding "{BBBBB}" suffix to name
Sep 25 23:37:17 htpc rclone[252704]: DEBUG : Couldn't decode error response: EOF
Sep 25 23:37:17 htpc rclone[252704]: DEBUG : fs cache: renaming cache item "B2:Bucket/path" to be canonical "B2{BBBBB}:Bucket/path"
Sep 25 23:37:17 htpc rclone[252704]: DEBUG : vfs cache: root is "/home/htpc/RCloneCache"
Sep 25 23:37:17 htpc rclone[252704]: DEBUG : vfs cache: data root is "/home/htpc/RCloneCache/vfs/B2{BBBBB}/Bucket/path"
Sep 25 23:37:17 htpc rclone[252704]: DEBUG : vfs cache: metadata root is "/home/htpc/RCloneCache/vfsMeta/B2{BBBBB}/Bucket/path"
Sep 25 23:37:17 htpc rclone[252704]: DEBUG : Creating backend with remote "/home/htpc/RCloneCache/vfs/B2{BBBBB}/Bucket/path"
Sep 25 23:37:17 htpc rclone[252704]: DEBUG : Creating backend with remote "/home/htpc/RCloneCache/vfsMeta/B2{BBBBB}/Bucket/path"

{BBBBB} = 5 random characters.

How to use GitHub

ncw commented 9 months ago

This is a hard problem to fix perfectly.

Its a bad problem if rclone gets it wrong, for instance if you pass --b2-account then rclone will be looking at a completely separate set of files, so rclone plays it safe.

However rclone assumes that if you are editing your config file, you really mean to point to the same set of files.

When this problem first came up I tried to think of a mechanism for labeling each option as being file preserving or not but in the end I decided it was too hard.

Note that rclone doesn't invalidate the cache - it is still there on the disk.

You can find out the location of the cache like this

$ rclone config paths
Config file: /home/ncw/.rclone.conf
Cache dir:   /home/ncw/.cache/rclone
Temp dir:    /tmp

And there is nothing stopping you renaming directories in there (with rclone stopped). You'll find a vfs directory and a vfsMeta directory. You'll need to rename directories in both.

maxim commented 9 months ago

Gotcha! Yeah, sounds like too much.

Have you considered letting someone specify --vfs-cache-name to be an override for canonical name? If specified, rclone will use that at user's own risk.