radicle-dev / radicle-surf

A code browsing library for VCS file systems.
Other
32 stars 11 forks source link

Diff Semantics #27

Open FintanH opened 4 years ago

FintanH commented 4 years ago

Once this is done we can make moves on #22.

After seeing the amount of questions and discussion from #24 we can see that: One Does Not Simply Implement Diff :ok_hand:

We should discuss what we expect from our diff'ing semantics and what techniques might be used to achieve this. The result of this research should be a document that lays out our semantics and serve as reference for the implementation and documentation for the repo.

Some initial thoughts:

kim commented 4 years ago

What we are missing currently is the possibility of conflicts. Those cannot arise when looking at only two states of a Directory: inevitably, one of them needs to be interpreted as newer, so the diff is effectively a path describing how to get from A to B.

If we take history into account, we can run into conflicts. For example:

A---B---D
 \
  C---E

Say B deleted a file, but C modified it. If we want to take a diff between E and D, there is no obvious way to tell which has precedence -- but this is only because we can interpret them as concurrent edits after a common ancestor A. I don't really understand yet how this works in pijul.

The question is, when would we need this? I'm not sure GH (f.ex.) shows you the conflicts with the merge target of a PR, it only tells you that there are conflicts afaik. Could be a very nice feature, though.

FintanH commented 4 years ago

So my thoughts on this that it might be derived by combining the Browser semantics alongside the diff. So something like the following:

browser.history("branch-d");
let d_directory = browser.get_directory();

browser.history("branch-e");
let e_directory = browser.get_directory();

let diff = diff(d_directory, e_directory);

BUT writing out that example now makes me understand what you mean. Because if we say E is target then the file looks like it was added and not modified. If we say D is the source of truth then the file just looks like it's deleted and we don't have the modification diff.

Thanks for that example!

FintanH commented 4 years ago

I suppose in this particular case we can try search the History to find a common ancestor A. We can then proceed to D1 = diff(A, D) and D2 = diff(A, E), followed by comparing the two diffs. For every file path that is unique in D1 and D2 they are "clean" merges. For every file path that has an intersection in D1 and D2, this is a conflict if we have two differing actions that oppose each other, in this case delete and modify.

kim commented 4 years ago

It’s probably worth noting that the types of the arguments are different:

diff :: Directory -> Directory -> Diff

is what we have now (cannot produce conflicts).

However, there’s also:

merge :: History a -> History a -> Either Conflicts Patch

where a Patch looks exactly like a Diff, except we can apply it to the Directory obtained from the first argument, and get back a new Directory. So, if you wish, it’s a diff where we don’t know the second argument yet.

FintanH commented 4 years ago

Mmm that's a good point. Something that's interesting though is that in our model you can only get a Directory from a History a anyway :thinking:

FintanH commented 4 years ago

Leaving this here as a reminder: https://docs.rs/git2/0.10.1/git2/struct.Repository.html#method.diff_tree_to_tree

cloudhead commented 4 years ago

Leaving this here as a reminder: https://docs.rs/git2/0.10.1/git2/struct.Repository.html#method.diff_tree_to_tree

Can we just start by wrapping this, under the git module? Not sure we need anything more than this for now.

FintanH commented 4 years ago

Can we just start by wrapping this, under the git module? Not sure we need anything more than this for now.

Certainly doable, but I'd also like to have something in mind for when we would wanna be more generic backend-wise. I'll leave this as a discussion and create a new ticket to just go through git for now :)