jborg / attic

Deduplicating backup program
Other
1.11k stars 104 forks source link

Cache location #358

Open mrkumar9001 opened 8 years ago

mrkumar9001 commented 8 years ago

I was wondering why the cache location is in the home dir. If the cache was located inside the attic archive, attic could be easily used by different accounts and computers without having those computers/accounts rebuild the cache(which takes a ton of time on a large archive) each time a backup was made from a different source.

ThomasWaldmann commented 8 years ago

The repository can be remote (ssh: repo locations) and encrypted.

I worked on accelerating cache sync, but it is not easy...

Schroedingers-Cat commented 8 years ago

On my NAS, attics cache folder is about 15GB (backing up roughly 6TB of data). Is there an option to either switch off cache and what factor would be the performance loss? Is it safe to symlink the cache folder to another drive?

mrkumar9001 commented 8 years ago

@ThomasWaldmann I see how remote locations wouldn't work that well, but I don't see how encryption would prevent that, like how encrypted hard drives don't have external cache locations, but inodes which store the data(of course the situations are much different, but I'm just giving an example). Also concerning the speed of rebuilding cache sync, maybe a local cache and a cache in the archive could be stored with timestamps on the each of them. Then if the local cache is older than the archive cache, attic could rsync to the local cache, so the local cache could be updated. I'm not sure how much overhead this would add, but I think it would be better than rebuilding the cache each time on a new computer/account.Then again, I'm clueless to the intricacies of how attic backs up and dedups data, so this might not help at all. Anyways, thanks for making such a great backup program.

ThomasWaldmann commented 8 years ago

@Schroedingers-Cat symlinking ~/.cache/attic to some directory on a filesystem with more space should work.

ThomasWaldmann commented 8 years ago

@m4nukum4r you have to thank @jborg - attic is his creation. ;)

My point with encryption is that a cache somewhere inside the repo directory would have to be encrypted also, to not disclose any information. To use it, you would not just only have to transfer it from remote, but also decrypt it again. Not sure if that is faster than the rebuild.

mrkumar9001 commented 8 years ago

Gotcha @ThomasWaldmann, and thanks @jborg for making attic!

If I'm interpreting what you're saying correctly, you're thinking of cache which is (de/en)crypted on the fly. I guess I wasn't too clear about what I was thinking, so sorry about that. I was thinking about something more like a luks encrypted drive, where the cache is in a container which is encrypted, and then decrypted when accessed. On a slightly different note, is the cache currently encrypted for encrypted archives?

Also I think that rsyncing caches from remote archives to the local cache would help a ton with speed/time rather than rebuilding bigger caches like Schroedinger's cats 15GB cache.

ThomasWaldmann commented 8 years ago

I understood what you said in previous post.

The current (local) cache is not and does not need to be encrypted. Your local backup source data isn't encrypted either. :)