Open chshersh opened 6 years ago
@int-index Could you please share your thoughts on this issue if you have time?
Garbage collection implies you can distinguish garbage from non-garbage. How do you intend to do that?
@int-index If I already have file with the same name but different hash, then probably this file has older version and thus is a garbage and can be deleted.
I don't think this assumption is true. One of the reasons to use a hash store is to be able to work with different versions of the same file.
@int-index For my use case there will be a lot of calls to hashStore
function for single file. And I have 100 files for single file quite quickly. And I actually need old files to be removed... I think I can add extra function like hashStoreGC
and let user decide what to use.
I think if the goal is to minimize the amount of disk space used, then the most general solution would be some sort of LRU cache. That is: remove the files that haven't been used in a long time.
We shouldn't use the actual file timestamps for this, though: I wouldn't expect the behavior of hashStore
to depend on the system clock. Instead, we must introduce a notion of logical time, where 1 clock tick = one call to hashStore
.
@int-index I'm not sure how to persist this clock time. If I create hash1-foo
it has duration of 1 tick. If after that I create hash2-foo
then hash1-foo
will have duration of hash1-foo
life equal to 2 ticks and hash2-foo
has 1 tick. I don't see good solution without having extra file for storing ticks but maybe there exist some elegant stateless solution for this problem with embedding clock into hash
or something like that.
I was actually thinking of an extra file.
Implementation of
hashStore
from issue #3 shouldn't contain any GC. But probably it's a good idea to have it. Though, need to think what is the proper way to do it :thinking: