sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
12 stars 2 forks source link

Rethink insertion migration (and possibly edit migration too). #83

Open sageserpent-open opened 2 months ago

sageserpent-open commented 2 months ago

This follows on from #32.

Working on tests for that ticket exposes some conceptual shortcomings of the approach used for insertion migration.

Summarising, there are some fairly hokey heuristics for deciding exactly where to place a migrated insertion between bracketing anchors (or near to a single anchor) when the destination side has also added content in the same place.

Tests written for #32 pass, but one of them is really documenting the poor outcome of the heuristics in one case, and I'm not certain that the other tests fully capture all the intricate possibilities of real-world situations. This isn't about code correctness (eg. preserving content through the merge / putting it in roughly the right location), rather this is about the aesthetics of the resulting merge output.

sageserpent-open commented 2 months ago

What was mentioned on that ticket was the possibility of using anchors to drive an after-the-fact merge of content between the anchors. The very existence of anchors means we have all three sides to the merge, so why not apply the tried and trusted three-way merge machinery to them?

That would address several things:

  1. Sections that are too small to match can be matched on the fly by the three-way merge, there is already functionality for this. Right now such sections simply confuse the insertion migration machinery.
  2. Merging aligns what are obviously edits of code in its original location with replacement code at the destination location, and allows flexibility in terms of where the insertion can be placed.
  3. We get ticket #29 for free if this works!
sageserpent-open commented 1 month ago

Ugh. After quite a bit of reflection, I've come to the conclusion that this is tricky.

The current plan is:

  1. Stick with supporting further migration (via anchoring) to a destination on the other side of an incoming move landing between an anchor pair or near an isolated anchor. This is to keep the functionality introduced in #32, where an intra-file move can be further migrated via anchoring.
  2. Regard the migration of edits and deletions as being a fundamentally separate mechanism from insertion migration (and for that matter, edit migration when no matches are involved, which is what #29 is about). So migrated changes will not be included in the three-way merge.
  3. Anchors are defined as the components of all-sides matches (again, we don't regard migrated changes - which are based on pairwise matches across moves - as being part of this mechanism) across moves that are neither degenerate, nor divergent nor coincident.
  4. We get these anchors be filtering the move destinations associated with match keys; first the match keys that are not all-sides matches are dropped, then we throw away divergent or coincident moves (which together should include degenerate moves too as a subset). The match keys remaining provide the anchors across the three sides. We build a mapping from destination anchor back to the matches that involve it, anticipating that a given destination anchor may associate with several moves.
  5. For each anchor, we build an anchored run of sections that are either preceded by or succeeded by the anchor. An anchor may both precede and succeed distinct runs.
  6. An anchored run does not include the anchor itself. It is grown away from the anchor, terminating just before it encounters another anchor, or part of a preservation, or part of a non-migrated edit, or part of a migrated edit or part or a migrated deletion.
  7. A one-sided deletion (which is never a migrated deletion) does not cause an anchored run to terminate.
  8. For the anchor that is the base part of the match and for the anchor that is on the side opposite to the side of the move destination, we build the anchored run by scanning away from the anchor in the appropriate file.
  9. For the anchor that is the move destination, we build the anchored run by scanning through the initial merge of the file. If the initial merge is conflicted, we take the side of the merge that provided the move destination.
  10. Determine upfront the anchored runs for the base, left and right sides, associating them with preceding and / or succeeding anchors.
  11. Determine upfront the anchored runs for the initial merge, again associating them with preceding and / or succeeding anchors.

(Roughly, given a move destination anchor in the initial merge, we need to find an anchored run that it may either precede or succeed, and also find unique anchored runs for the base anchor and the side opposite to the move destination. This has to take ambiguous matches into account, too. Once we have this we can merge the three anchored runs and substitute into the initial merge, taking into account that we may need to concatenate dissimilar anchored runs from a preceding and succeeding anchor. Then we apply change migration as a further step. Only perhaps we delete the migrated changes prior to doing the anchoring?)

sageserpent-open commented 4 days ago

I'm not sure about doing the merging of anchored content without taking into account migrations - since #113, migrated edits are suppressed as they make their way into the initial merge. So any anchored content taken from the base or side opposite to the anchor's move destination should also be subject to migrated edit suppression too.

However, if the migrated edits are suppressed on a given side, how do we know where to terminate the anchored run, assuming we don't want to skip over the source of the migration's move?

Oh, wait - point no. 6 above stated that an anchored run should terminate before it comes to a migrated edit. So as long as we recognise the migated edit (via the migrated edit suppressions), we won't have this problem. We can then work with the base and opposite side content as-is. Phew.

Migrated deletions look like a problem - there is no marker! Except that any deletion (migrated or not) can't be preceded or followed by an insertion, only by an edit (migrated or not). Anchored runs must be insertions or deletions, so this can't happen in practice.

We do still have to recognise the source and destination of moves that migrate either an edit or a deletion and terminate an anchored run to exclude them.

sageserpent-open commented 3 days ago

Coincident insertions and edit that are move destinations are a pain as usual - issue #113 introduced a hokey, but workable approach of resolving all anchor moves upfront in MoveDestinationsReport.evaluateSpeculativeSourcesAndDestinations, inserting the resolved content at the move destinations in the initial merge.

Coincident insertions and edits are treated in a similar fashion, only with two isolated move destinations, each sharing the same resolved content. That is resolved one more time when in the merge algebra handling the coincident insertion or edit.

I'm unsure as to whether to record anchors using the resolved content to represent the destination anchor in the initial merge - and thus not try to merge between the anchors themselves, leaving them out of the anchored runs. If so, then we may as well stick with the existing approach of calculating resolutions for moves in MoveDestinationsReport.evaluateSpeculativeSourcesAndDestinations, keeping it entirely distinct from resolving preservations and coincident insertions / edits that are not coincident move destinations.

The thing is, what do we do with insertions / edits that are coincident move destinations? They need to go somewhere in the initial merge, ideally not being fudged as a coincident insertion / edit with pre-resolved content on both sides.

Suppose we make a fake conflict in the core merge algebra, recording both the left- and right-content? Substitution performed on the initial merge would then remove the conflict.

In fact, given that anchors are currently recognised prior to substitution, that would allow the two sides of the coincident move destination to exist simultaneously in the initial merge....

sageserpent-open commented 3 days ago

Another insight - coincident move destinations cannot be anchors. Yes, they do participate in an all-sides match, but because both the left- and right-sides agree on the destination, there is nothing to actually migrate from the standpoint of any inserted content preceding or succeeding the move destination on either side - that is where a migrated insertion would have to come from. The inserted content is already in the right place.

It suffices to just resolve the contributions to the move and let the initial merge decide what to do with those insertions.