sahib / brig

File synchronization on top of ipfs with git like interface & web based UI
https://brig.readthedocs.io
GNU Affero General Public License v3.0
568 stars 33 forks source link

More robust metadata syncing #74

Open evgmik opened 3 years ago

evgmik commented 3 years ago

Below are quotes from communication with @sahib, quoted text belongs to @evgmik.

Here is the scenario, Ali and Bob made changes in their repos. They were syncing from time to time. Let's even assume that they have fully synced metadata.

Ali by incident nukes his \~/.brig/metadata.tgz.locked, but he could synchronize with bob and he does. So on his side content is restored and he is still tracking bob. The problem is at bob's side, when he asks for diff or sync there will be an error message

diff: No commit with index `3` found

Since Ali has only one diff (after first sync). If Ali does enough commits, there will be proper patch number, but it will be with wrong  metadata which assumed to be the same.  So sync is dangerous.

What I suggest is to put a hash to every diff message. If last know diff is missing, we go back in history until we find the common ancestor, worse case scenario if would empty repo state (which should have the same hash in any repo for any user). This way we can recover from destroyed metadata case with minimal loses.

I think you have a point here, although I'm not so much worried about the scenario above. But the patch number is an additional concept we might not need.  Also it's additional state that might get out of sync or is calculated wrong because we introduced a bug.  We already have hashes indicating "diffs" - those are just the commit hashes.

So it would be nicer if we could the patch API from this:

interface Sync {
    fetchPatch    @1 (fromIndex :Int64) -> (data :Data);
    fetchPatches  @5 (fromIndex :Int64) -> (data :Data);
}

to this:

interface Sync {
    # If "to" is empty, fetch complete diff until staging commit:
    fetchPatch    @1 (from :String, to :String) -> (data :Data);
    fetchPatches  @5 (from :String, to :String) -> (data :Data);
}

There is one downside of this approach (which is I think why I choose patch numbers instead): The commit hashes on the metadata copy of the remote will not be the same as on the remote - since the copy does not have to be complete (some folders might be missing e.g.) - so we need to store and trust the commit hashes coming from the remote. Using indices was an easy way to workaround that storage.