Open kayceesrk opened 3 years ago
This is odd -- Merge.alist
(used by Node.merge
) indeed prefers modify over delete (as it iterates on the existing filenames in the 3 values to merge). I don't understand why merging wip
into master
seems to prefer the delete over modify. I'll try to investigate this further.
Note: we could probably change this to something a bit more consistent (especially if it simplifies the semantics) - but I'd like to avoid looking at the filenames in the LCA if it is not necessary (as it's a costly operation).
Note: we could probably change this to something a bit more consistent (especially if it simplifies the semantics) - but I'd like to avoid looking at the filenames in the LCA if it is not necessary (as it's a costly operation).
The question is whether Irmin wants to mirror Git semantics for merging trees. We can choose a different semantics than Git, but we should be clear about what the new semantics is.
I think it's fine to not keep the same semantic as Git - as long as the new semantics is clear, consistent and easy to explain (which I don't think the Git semantics particularly is).
Also: Node.merge
and Tree.merge
do no currently behave the same; Tree.merge
seems to handle modify/delete conflicts better, but we need to make all of this much more consistent.
What is the Node.merge
and Tree.merge
mentioned here? If we choose to go for a semantics that diverges from Git, we should probably write down our motivations for doing it.
Given that we now have the formal operational semantics for merging trees that correspond to Git behaviour, I don't find it complicated :-) Perhaps a little bit of work is necessary to explain the semantics to non-PL-researchers. This can certainly be done with the help of exhaustive examples + intuition behind the choice.
I should mention that I do find Git's semantics intuitive if you start from the idea that Git doesn't want to enforce a particular merge policy when there is conflicting intent. For example, if concurrent branches modify and delete the same file (this issue), there is no automatic merge that doesn't enforce a policy -- whether modify survives or delete survives. Git tries to avoid making that decision, leaving it to the user code to fix the conflict.
Also, it is unclear that if we do go for a semantics that is not Git, it would be more explainable. In the case of concurrent modify and delete, you could prioritise modify. This choice is reminiscent of the add-wins CRDT set semantics [1]. Now, this choice certainly is automatic, but is this more explainable or intuitive? Why did we not choose delete-wins semantics? Given that we enforce a policy here, there is no way to encode the delete-wins semantics, unlike Git which leaves it to the user to decide what to do.
What are the performance metrics that we are optimising for? You mention that you want to avoid looking at filenames in the LCA. Are we certain that the costs are unacceptable for real programs? Are there benchmarks where we can evaluate the costs. What other constraints should we care about? These constraints will influence the design of the merge.
Happy to modify the formal semantics to capture the intended behaviour. CC @ssdubey.
[1] https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type#OR-Set_(Observed-Remove_Set)
On the heels of #1433, here is an execution whose equivalent behaviour using Git would trigger a modify/delete conflict. More seriously, Irmin merge here is not commutative.
The idea here is that there is a file in LCA which is deleted in
master
branch and modified inwip
branch. This represents concurrent modify and delete operation. Git reports a conflict as there cannot be an automated resolution procedure that favours one of the operations over the other.While merging
master
branch intowip
seems to prefer modify over delete (the read after merge returns the modified value), mergingwip
intomaster
seems to prefer the delete over modify (the key is not found). This makes the Irmin merge non-commutative.I guess the fundamental issue here is that Irmin tries to treat the tree nodes as a mergeable replicated data type (MRDT), while Git does not. This makes Irmin diverge from Git semantics. Independently, the merge on trees has bugs (merge not being commutative). One option is to actually report both #1433 and this as a merge conflict, and leave it to the user code to fix the conflicts (by ensuring that the problematic keys are updated).