Add Garbage collection to `hashStore`

kowainik / hash-store

Hash as cache

https://kowainik.github.io/projects/hash-store

Mozilla Public License 2.0

2 stars 1 forks source link

Add Garbage collection to `hashStore` #4

Open chshersh opened 6 years ago

chshersh commented 6 years ago

Implementation of hashStore from issue #3 shouldn't contain any GC. But probably it's a good idea to have it. Though, need to think what is the proper way to do it :thinking:

chshersh commented 6 years ago

@int-index Could you please share your thoughts on this issue if you have time?

int-index commented 6 years ago

Garbage collection implies you can distinguish garbage from non-garbage. How do you intend to do that?

chshersh commented 6 years ago

@int-index If I already have file with the same name but different hash, then probably this file has older version and thus is a garbage and can be deleted.

int-index commented 6 years ago

I don't think this assumption is true. One of the reasons to use a hash store is to be able to work with different versions of the same file.

chshersh commented 6 years ago

@int-index For my use case there will be a lot of calls to hashStore function for single file. And I have 100 files for single file quite quickly. And I actually need old files to be removed... I think I can add extra function like hashStoreGC and let user decide what to use.

int-index commented 6 years ago

I think if the goal is to minimize the amount of disk space used, then the most general solution would be some sort of LRU cache. That is: remove the files that haven't been used in a long time.

We shouldn't use the actual file timestamps for this, though: I wouldn't expect the behavior of hashStore to depend on the system clock. Instead, we must introduce a notion of logical time, where 1 clock tick = one call to hashStore.

chshersh commented 6 years ago

@int-index I'm not sure how to persist this clock time. If I create hash1-foo it has duration of 1 tick. If after that I create hash2-foo then hash1-foo will have duration of hash1-foo life equal to 2 ticks and hash2-foo has 1 tick. I don't see good solution without having extra file for storing ticks but maybe there exist some elegant stateless solution for this problem with embedding clock into hash or something like that.

int-index commented 6 years ago

I was actually thinking of an extra file.