Closed arv closed 4 years ago
This brings up the next point that of course we don't want all commits to strongly reference their basis. Because we want to actually delete history. But the local pending ones do need to ... because we need it during sync.
@phritz what should the actual policy be?
Is it a true statement that for sync, we only need pending commits to strongly retain their basis?
That seems right to me but I would like to get that confirmed.
Is it a true statement that for sync, we only need pending commits to strongly retain their basis?
I think it's true to say that we need:
Said another way, we need to retain all commits from the main and sync heads to the base snapshot.
We have to decide what to do about two cases:
original
that no longer retained by the above rules. These original commits were previously on the main chain but no head is pointing to them anymore.I guess the question is how aggressively we want to prune old commits, specifically precious base snapshots we have synced on top of and old replay commits that we have replayed on a new snapshot. My sense is that we should not keep these old commits around.
On Mon, Sep 28, 2020 at 12:07 PM Phritz notifications@github.com wrote:
I guess the question is how aggressively we want to prune old commits, specifically precious base snapshots we have synced on top of and old replay commits that we have replayed on a new snapshot. My sense is that we should not keep these old commits around.
I get the sense it will be easier to reason about if we remove everything that isn't strictly necessary. If you agree, let's do that.
If we remove everything that's not strictly necessary then we have to rewrite the sync chain in order to finalize sync. We need to update the sync snapshot to remove its basis and then each replayed commit on top has to have its basis pointed to its new parent. (If we make original strong, we also have to reset original in each of the replayed commits as we go. If we don't make it strong, it can just dangle. But preference for making it strong and cleaning it up.)
The goal to remove everything that is not strictly necessary seems like the right thing to do. We do not want to use up more space than we need.
We talked about this on VC and I did some more researching of the existing code.
Goals:
original_hash
is currently read to compare the name
so we need to keep it alive as long as we have the local commit. When the local commit goes away (replaced by a snapshot commit) the original local commit will also go away.
We have "GC roots" for head
and the sync_head
.
The only thing keeping a snapshot commit alive is a local commit, the head
or the sync_head
.
Things that keep a local commit alive is head
, sync_head
or it being the basis
or original
of another local commit.
Once we are fully synced there will be a single snapshot commit, referenced by head
. Everything else can be collected.
This has no fixups. Things will go away as needed and once there is only a snapshot commit it will have a dangling reference to the basis
which has now been collected.
Not sure what "no fixups" means, we need do to address the two things that I pointed out:
Possibly one bit of missing info is that when we are done syncing head does not always point to the new base snapshot. Sometimes there are also local replayed commits on top of it. When the sync head becomes the new main head it can point to one of these local commits, which are still pending.
I think I am in favor of making snapshot basis pointers always weak for simplicity because it avoids “fixing up” the commit chain after sync.
However I do see it weakens some invariants. Sync has to rely on the fact that the main head is keeping sync snapshots basis alive. I think this is correct trade off but fritz is closest to this code so his preference should override mine.
I think the only thing I’m firm on is that we shouldn’t ever retain any data not needed for sync.
Plan LGTM. Let me know when to review this PR.
Maybe I'm missing something but the concept of weak vs strong is just an item in the refs entry. It does not change the chunk data. It is OK to have a string, with a hash that used to point to a chunk that is now removed.
We can therefore change original
from strong to weak as we need without changing the data or the hash and we do not to rewrite (fixup) any chunks
Not sure what "no fixups" means
I mean that we do not want to rewrite chunks unless we have to.
Possibly one bit of missing info is that when we are done syncing head does not always point to the new base snapshot. Sometimes there are also local replayed commits on top of it. When the sync head becomes the new main head it can point to one of these local commits, which are still pending.
That is fine; the local commits keep the snapshot commit alive.
PTAL
Maybe the enum Ref
is a bit of overkill? I learned a lot by trying that approach though.
Maybe I'm missing something but the concept of weak vs strong is just an item in the refs entry. It does not change the chunk data. It is OK to have a string, with a hash that used to point to a chunk that is now removed.
Right. IF we wanted to remove the ref for cleanliness to not have dangling pointers THEN it would change the hash. But if we are comfortable with them dangling then no worries.
Fixes #105