Open road2react opened 2 years ago
So yes, we could add an option, at least as a way to experiment.
I would like to set a default that is good for the widest range of situations rather than making people work it out.
The block size also determines the granularity with which identical content can be found across different files, or across different versions of the same file. So increasing it is likely to, to some extent, decrease block reuse.
One goal for Conserve is to very aggressively issue lots of parallel IO, making use of Rust's fearless concurrency. It already does this to some extent, and there is room to do much more. Many factors in contemporary systems align with this approach: many cores, deep SSD device command queues, high bandwidth-delay networks. If we have say 10-100 requests in flight then the per-request latency still matters but not as much. So I think this is the main thing to lean on, but it is more complicated to implement than just increasing the target block size.
Another thing to consider there is that Conserve tries to write blocks of a certain size but for various reasons some objects can be smaller; they could be made larger but that would have other tradeoffs e.g. around recovering from an interrupted backup. So again, parallelism for throughput.
Finally: are you really seeing seconds per 1MB object to cloud storage? Which service, over what network? I wouldn't be surprised by 300-1000ms but seconds seems high?
If someone is interested in doing this here is a guide:
There are actually a few variables to consider, including:
BackupOptions
BackupOptions
instead-o entries_per_hunk=10000
I think that's it.
Hey, just a quick thought:
What happens if we mix a backups with different block size in the same archive?
Finally: are you really seeing seconds per 1MB object to cloud storage? Which service, over what network? I wouldn't be surprised by 300-1000ms but seconds seems high?
I'm using Box mounted using rclone. Using the terminal, cd-ing into the mount takes ~1s for a normal directory, but several seconds if there are many files in the directory.
Hey, just a quick thought: What happens if we mix a backups with different block size in the same archive?
Nothing too bad: nothing should be making strong assumptions that the blocks are of any particular size.
Unchanged files (same mtime) will continue to reference the blocks they used last time.
Files not recognized as unchanged but which in fact have content in common will no longer match that content if the block size changes, so all their content will be written again to new-sized blocks. That would include cases like: the file was touched (mtime updated with no content change); the file was renamed or copied; more data was appended to the file.
We should still test it of course. And if this is relied upon it should be (more?) explicit in the docs.
If we want larger files probably the index hunks would be the place to start.
There is also an assumption that a number of blocks can be fairly freely held in memory. So we shouldn't make them 2GB or anything extreme like that, where holding 20 simultaneously could cause problems.
Finally: are you really seeing seconds per 1MB object to cloud storage? Which service, over what network? I wouldn't be surprised by 300-1000ms but seconds seems high?
I'm using Box mounted using rclone. Using the terminal, cd-ing into the mount takes ~1s for a normal directory, but several seconds if there are many files in the directory.
Interesting... I wonder how many API calls are generated from a single file read or write.
Running conserve with -D
may give you an idea which file IOs are slow.
There might be a big win from a transport that talks to the Box API directly, which would be some more work, but perhaps not an enormous amount.
I ran with -D
and only received the following output:
2022-08-12T17:38:55.829222Z TRACE conserve: tracing enabled
2022-08-12T17:38:55.829283Z DEBUG globset: built glob set; 1 literals, 2 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
Nothing else has been printed and the progress indicator does not show.
Running without -D
allows the progress bar to show and indicates that the backup is working.
Without -D
, 10 new entries, 0 changed, 0 deleted, 0 unchanged
appears to show a new entry 2-3 times per second. (I assume the several seconds caused by cd
is due to the traversal of every file in the directory.)
There might be a big win from a transport that talks to the Box API directly, which would be some more work, but perhaps not an enormous amount.
I'm using rclone's encryption function, which may not work if talking to the Box API directly.
Oh the logging might be on my SFTP branch.
I don't think Conserve requests to read any archive directory during a backup. (It does during validate, delete, and gc).
If rclone reads the remote directory repeatedly even when the app does not request it that may be a performance drag regardless of block size.
Perhaps you can get a request log out of rclone?
And, let's split Box/rclone performance to a separate bug.
rclone reports a lot of repeated reads:
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: Open: flags=OpenReadOnly
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: Open: flags=O_RDONLY
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: >Open: fd=conserve/b0000/BANDHEAD (r), err=<nil>
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: >Open: fh=&{conserve/b0000/BANDHEAD (r)}, err=<nil>
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: Attr:
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: >Attr: a=valid=1m0s ino=0 size=56 mode=-rw-r--r--, err=<nil>
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: Read: len=4096, offset=0
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: ChunkedReader.openRange at 0 length 1048576
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: ChunkedReader.Read at 0 length 4096 chunkOffset 0 chunkSize 1048576
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: >Read: read=56, err=<nil>
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: Flush:
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: >Flush: err=<nil>
rclone[376785]: DEBUG : &{conserve/b0000/BANDHEAD (r)}: Release:
rclone[376785]: DEBUG : conserve/b0000/BANDHEAD: ReadFileHandle.Release closing
These reads repeat several times before reading conserve/d/<id>
.
Each instance of the BANDHEAD
read operation aligns with an increment of the amount of new entries
in the progress display.
Stopping and restarting the backup operation continues reads on b0000
, even after new ones are created.
Occasionally, rclone reports
DEBUG : conserve/d/1d6: Re-reading directory (25h16m9.701252182s old)
DEBUG : conserve/d/1d6/: >Lookup: node=conserve/d/1d6/1d61f19d98dbc87b0a6b3b1941b7133aaaebed5a19db44f85b3f42a1d739a36cab1d0e7179ba1297a0338d21873e253662152156fd4f6b551f2b36e8a71a9674, err=<nil>
in succession, for different ids. I believe that this corresponds to writing a block file.
This might be connected to #175 just fixed by @WolverinDEV which is one cause of repeatedly re-reading files.
However, there are some other cases where it reads a small file repeatedly in a way that is cheap on a local filesystem (where it will be in cache) but might be very slow remotely. It's definitely worth fixing, and I think I have fixed some in the sftp branch, but there are probably more.
With the latest changes:
There looks like there are no more repeated reads. Now, it looks like for each block, it first checks if it is already written. If not, it will create a temp file (tmp08SFF2
), write to that file, and rename it to the block id. This appears to be 3 round trips per block, which aligns with the network pattern:
1 MB spike around every second, where each operation has ~300ms round trip.
Yep, it does currently
This is pretty reasonable (although perhaps not optimal) locally but not good if the filesystem is very high latency.
A few options:
This might be connected to https://github.com/sourcefrog/conserve/pull/175 just fixed by @WolverinDEV which is one cause of repeatedly re-reading files.
What @road2react describes seems to be pretty much, what I experienced as well.
The odds are high, that this has been fixed by #175.
Right now, blocks are limited to 1 MB each. When backing up to a cloud storage, the latency related to reading and writing may be significant, reaching up to several seconds per operation.
A potential way to reduce overhead would be to increase the block size.
Would this be possible?