Open ThomasWaldmann opened 9 years ago
so could these caches be turned into fixed-size (say relative to available RAM for example) LRU caches? in other words, are they really caches (that we can discard) or indexes (that we can't discard)?
So, the question now is "what are the options to deal with bigger data amounts?".
Some ideas:
@anarcat they are caches in the sense that they cache information from the (possibly remote) repository. So you could kill them and they could be rebuilt from repo information (or from fs when creating the next archive).
LRU won't help as for the files every entry is accessed only once per "attic create". For the chunks, there are sometimes multiple accesses, but not in a way where LRU would help.
ah right, so even if the caches would be reused, not much because it's only for "within a filesystem" deduplication...
okay, so another strategy, which you seem to already have a few ideas for.. i guess the next step is benchmarks, as there are fairly low hanging fruits there (chunk size, for one..)
My 2 cents is that the chunk size and whether or not the cache should be maintained in RAM will depend on the particular circumstances to which attic is being applied as there are many use cases, variables and trade-offs to consider.
Therefore, my present assessment is that it makes sense to:
Regarding point 2, modern linux kernels support per-cgroup resource limiting. So one way to address seamless fallback from RAM to disk would be to put attic in a cgroup with whatever resource limits and swappiness suit their particular use case. However, this may be considered a bit of a hack and, of course, will not help Mac or Windows users.
@ThomasWaldmann as requested on #300 here is a bit more data from my setup: my media weighs 2.8 TB and currently has 6109 files. Usually memory usage of Attic was ~11% but at the end it was mostly ~50%. Right before Attic died the usage went up to ~70%. Let me know if you need more details.
@mathbr ~70% of 8GiB is ~5.6GB. The formula computes 6.5 (5.3 if remote repo) GiB RAM usage for your backup data. As the formula does not consider all of attic's memory needs, just the repo index and files/chunks cache, it seems to fit. If you had some other stuff running besides attic and your swap space wasn't very large, that maybe was all the memory you had.
Well there where indeed a few apps running in parallel, most of the memory being claimed by Chromium and Plex Media Server, everything else is rather lightweight (running Xfce as desktop).
My swap is at 2GB which is is not much but with 8GB I actually shouldn't need it at all. ;-)
Has anyone tried again with that latest change yet? I'd like to know in advance how this fares before giving it another try. ;-) Just noticed that this change was from July 2014, nevermind.
To accelerate operations, attic keeps some information in RAM:
In this section (and also the paragraph above it), there are some [not completely clear] numbers about memory usage: https://github.com/attic/merge/blob/merge/docs/internals.rst#indexes-memory-usage
So, if I understand correctly, this would be an estimate for the ram usage (for a local repo):
E.g. backing up a total count of 1Mi files with a total size of 1TiB:
So, this will need 3GiB RAM just for attic. If you run attic on a NAS device (or other device with limited RAM), this might be already beyond the RAM you have available and will lead to paging (assuming you have enough swap space) and slowdown. If you don't have enough RAM+swap, attic will run into "malloc failed" or get killed by the OOM Killer.
For bigger servers, the problem will just appear a bit later: