sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
12 stars 2 forks source link

Resolve divergent moves when one is intra-file and the other inter-file. #32

Closed sageserpent-open closed 2 months ago

sageserpent-open commented 7 months ago

This addresses the test failure seen in Git commit SHA: d5c60158f0c3f99c115b2aa22a1eec1c9063e617.

The problem there is a divergent move of a section that moves intra-file on one side and inter-file on the other. This is interpreted by the merge algebra for MergeResultDetectingMotion as being a coincident deletion of the source of a divergent move, so the coincident deletion is duly progagated to both move destinations. EDIT: this has been changed subsequently so that nothing is propagated, but the problem of having a divergent move still remains.

However, when one move destination is intra-file and the other is inter-file, then it can make sense (certainly for the failing test case) to transpose the intra-file code motion into the destination file for the inter-file move. This is the case when the destination file is also the recipient of lots of other section moves from the same source file that aren't divergent - it seems obvious that the entire source file is simply being renamed, and the intra-file code motion is just an edit to be propagated.

sageserpent-open commented 3 months ago

Need to revisit the merge example test discussed in #49 as part of this ticket.

sageserpent-open commented 3 months ago

The work done in #30 might provide a cheap-and-cheerful solution to this. Currently, any insertion that is not a move destination is checked to see if it part of a run of insertions that might be flanked by anchors; if so, it is eligible for migration.

Looking at the move destination that is intra-file, the aforementioned check could be generalised so that an intra-file move destination that is part of a divergent move whose other side is inter-file is allowed. The insertion migration machinery should then pick it up.

Both moves from the divergence are thrown away and whatever the insertion migration decides to do with it becomes the resolved move. Not sure whether to patch up the move destinations report when this happens?

Worth a spike...

sageserpent-open commented 3 months ago

Added a test in 7f3ee716f6a9b4d3355aee8d461fe0cffdfc42cd to explore this.

Obviously it is expected to fail, but there is some additional background to the failure that is worth thinking about...

BASE:


Hey diddle diddle,
The cat and the fiddle,
The cow jumped over the moon;
The little dog laughed
To see such sport,
And the dish ran away with the spoon.

LEFT:


Hey diddle diddle,
The cat and the fiddle,
The cow jump'd over the moon;
The little dog laugh'd
To see such craft,
And the fork ran away with the spoon.

RIGHT:


Hey diddle diddle,
The cat and the fiddle,
The cow jumped!
The little dog laughed
To see such sport, (in fact he was over the moon)
And the dish ran away with the spoon.

EXPECTED:


Hey diddle diddle,
The cat and the fiddle,
The cow jump'd!
The little dog laugh'd
To see such craft, (in fact he was over the moon)
And the fork ran away with the spoon.

ACTUAL RESULTS:

(in fact he was over the moon)
Hey diddle diddle,
The cat and the fiddle,
The cow jump'd over the moon!
The little dog laugh'd
To see such craft,
And the fork ran away with the spoon.
sageserpent-open commented 3 months ago

Section breakdown:

Hey diddle diddle,
The cat and the fiddle,
The cow 
jumped
jump'd
over the moon
;
!
The little dog 
laughed
laugh'd
To see such 
sport,
craft,
And the fork 
(in fact he was 
)
And the dish 
ran away with the spoon.
sageserpent-open commented 3 months ago

I hadn't anticipated needing to handle inserted context around the move destination of the intra-file part of the divergent move. That should move to surround the final resting place of the divergent move.

sageserpent-open commented 3 months ago

Forgetting about the divergent move that is the core part of this ticket's work, what stopped the surrounding inserted context from being migrated was that the potential anchors both belonged to base-and-right matches; thus there is no move destination on the left to make them serve as anchors.

For the inserted context to be migrated, the implementation is going to have to skip beyond these pairwise matches to find a proper anchor...

sageserpent-open commented 3 months ago

As of 710428f9046d8dd8b9537029096924f69650c291, now have three test cases:

  1. Dealing with insertions that are separated from their anchors by intervening sections that are deleted in the merge.
  2. A pure intra-file move that needs to be migrated across a file rename.
  3. A combination of the two previous situations.

Now to get all three test cases to pass...

sageserpent-open commented 3 months ago

Trying to get the first test case to pass reveals a nasty gotcha...

Say we have an insertion on the right with a nearby anchor (or bracketing pair of anchors).

If the destination code on the left already has an insertion following the anchor, then the migration mechanism will put the right-insertion directly after the preceding anchor (so before the left-insertion) and will again put a copy of the right insertion directly in front of the succeeding anchor (so after the right-insertion).

Strictly speaking there is no obvious clue as to where the migrated right-insertion should go - in fact it probably should be considered to be in conflict with the left-insertion, so the user can make the decision.

This looks even weirder when the right-insertion has what appear to be left-deleted sections before or after it, because an obvious interpretation is for the left-deletions to be considered as left-edits, only with the left-edits effectively in a move destination (which is why the per-file merges think they are left-deletions).

This all holds in a mirror image as well, naturally.

The first test case's expectation is indeed for the migrated right-insertion to land between the implied left-edits as a single insertion - so can this be achieved, or do we simply cop out and have a merge conflict with the left-edits versus the right insertion?

Remember, if there are no implied left-edits, we will have to settle for a conflict anyway...

sageserpent-open commented 3 months ago

Looking at this with some simpler related test cases makes it clear that there is no chance of getting the desired merge - because what to a human seems to be an obvious insertion point for (in fact he was over the moon) lies slap bang in the middle of a single section, namely:

craft,\nAnd the fork.

Yes, the original text was split into two sections:

sport,\n followed by And the dish, but Kinetic Merge doesn't understand natural language.

The best it can do is either put the insertion + migration before or after the single replacement gap-fill (which is plain wrong), or punt and treat it as a conflict to be resolved manually by someone who understands the content.

Even if there were several sections instead of the single gap-fill - say due to gap-fill fragmentation or because of an incoming move destination, how should Kinetic Merge decide where to splice in the insertion + migration - or even just an insertion for that matter?

So conflict it is!

sageserpent-open commented 3 months ago

No - the reasoning above isn't completely watertight. What the implementation currently does could be thought of as correct.

Suppose the anchors around an inserted section that we want to migrate go to completely different locations, either in the same file or across two files? This is a kind of divergent move of the migrated insertion and we expect to see duplicate copies of the migrated section, in much the same way as a divergent move of a single source section should be honoured in the merge results.

So when the anchor destination code inserts one or more sections between the moved anchors, this is just makes the anchors move divergently - they were adjacent at their source locations, now they have moved apart instead of in parallel (which would keep them adjacent). The implication is that the inserted section is glued to both of its anchors; thus any insertion between the anchors can't be considered as possibly interposing between an anchor and the glued section.

If we were to allow conflicts, this could only make sense if the anchors land in the same file and in the same order. They might even end up at the start and end of the destination file, with everything in between being in conflict with the migrated inserted section. Yuck!

It gets more interesting when the side that contributed the migrated insertion also contributes a conflicting insertion against either or both of the anchor destinations. If we have the anchors bracketing a conflict, then do we consider one or both of the anchors themselves in conflict too? How do we decide where to place the migrated insertions wrt to the opposite sides of the anchor conflict? What we really have is a tree of conflicts, where there is a big overall conflict between the anchors and their opposing insertion(s), but the anchor side of the conflict then contains a nested conflict between the stuff inserted between the destination anchors and the migrated insertions. Argh!

sageserpent-open commented 3 months ago

This all makes me wonder - are there even any tests for when the anchor destinations are in conflict with insertions from the side contributing the migrated insertions?

For that matter, suppose we have a migrated edit in conflict with insertions from the side contributing the migrated edit?

sageserpent-open commented 2 months ago

The story so far, as of e328aba7e5ddf946448b1cba8805ed46043902d1 ...

  1. Any insertion that is also a move destination and lands under the influence of one or two anchors is now treated as eligible for further migration via the anchor motion.

  2. Anchors can be found by skipping over deletions of elements on the opposite side of the insertions that we're looking to anchor - but not beyond edits.

  3. Because anchor destinations may conflict with other insertions made on the opposite side, we have to admit the possibility of conflicts involving the anchors.

  4. Because anchors may move divergently, possibly swapping around in the same file or moving to different files, we don't try to represent a migrated run of insertions as being in conflict with any insertions made between the anchor destinations - instead we treat the insertion run as being glued on to both anchors and migrate them as such to the anchor destinations. If this results in divergence, we allow the insertion run to be duplicated.

  5. Because divergence is allowed anyway when simply migrating an insertion run, if there is a divergent move that makes an insertion that is itself eligible for migration, we simply move that side of the original divergent move one more time, possibly making even more divergence.

  6. On the other hand, an insertion that is moved by anchoring is not considered eligible for further movement, because the anchor(s) that picked it up would block any further potential anchors.

The upshot is that as it stands, the original aim of this ticket hasn't been achieved - it's been replaced by something else that is more generic, but also a bit hokey. Unless things work out well, it's left to the user to manually patch up the migrations, although the heavy lifting of moving the code from one file to another is actually done - it just expects a bit of finessing.

sageserpent-open commented 2 months ago

Could we do better?

  1. Suppose we have a divergent inter-file versus intra-file move, and the intra-file move is migrated via anchoring so that it lands in the same file as the inter-file move (or one of its divergent migrations does, anyway). We could take a punt and suppress the original inter-file move, hoping that said move was strictly due to a file rename and did not itself embody some local code motion made in the renamed file. Using context around the divergent move's source would help to confirm that the original inter-file move is safe to suppress in favour of the migration. DONE, BUT IN A BRUTE-FORCE WAY as of d2f378c15bce37d9e1bf6a8e15192f2f4d7ed192

  2. The contract for merge.of already specifies that neighbouring edits on the same side should be treated as independent edits, especially when such edits have interspersed insertions on the opposite side from the edit. Now, in the situation where an insertion run is migrated via anchoring, but the destination of the anchors on the opposite side has insertions of its own, then we could try to claim those insertions on the opposite side as being edits of any deleted sections coming in between the insertion run and its anchor(s). If there are no such deleted sections, then too bad, but this might yield a decent result. DONE as of 2a7a35a66bc02151b40ee5a20faf2ee2d72249ff

sageserpent-open commented 2 months ago

Need to review what's there next - as of d2f378c15bce37d9e1bf6a8e15192f2f4d7ed192, all move destinations of a move are suppressed when one of them is picked up by insertion migration. This fixes the outstanding test failure, but is too ham-fisted.

sageserpent-open commented 2 months ago

Also need to perform some coverage testing, as I'm not sure that all code paths in the migration logic are being exercised by the test suite.

sageserpent-open commented 2 months ago

Another thing - the logic to handle a migrated insertion that is bracketed by a preceding and succeeding anchor needs reviewing - if two anchors with equivalent associated insertions moved from unrelated locations move together, then should that be treated as migrating a single bracketed insertion or two duplicates? I'm not even sure whether the test should check equivalence of the insertion content or just the sections...

sageserpent-open commented 2 months ago

Git commit d5b96937ddb6827c58442143ff0698665aa58291 introduces a test case that breaks - this is when a pair of bracketing anchors have two insertions between their move destinations, but the source has only one skipped section between the preceding anchor and the insertion that needs to be migrated.

This results in the skip after the preceding anchor claiming one of the destination insertions, then the migrated insertion is placed in because we're out of skips, then comes the next destination insertion, then a duplicate of the migrated insertion is placed in, then comes the succeeding anchor. It doesn't look good.

sageserpent-open commented 2 months ago

Actually, the Scaladoc for merge.of states that edits can greedily capture following insertions, except when followed by other edits on the same side. Git commit b0641ea95bbe9bf0927730d71398a958fb93f50a takes this approach when deciding whether to deem a pair of bracketing anchors as referring to a single migrated insertion, allowing the first skips associated with the preceding anchor to swallow up any destination content that isn't claimed by the succeeding anchor's skips.

The results look OK.

sageserpent-open commented 2 months ago

Git commit SHA: 77929a691e9774239ba06e7d07d2712e3e4b06f9 essentially reverts the workaround referred to in d5c60158f0c3f99c115b2aa22a1eec1c9063e617, as discussed at the starrt of this ticket. The relevant tests now pass.

sageserpent-open commented 2 months ago

As of 14fdbdb1c3f93fb11736bd841c2dfedc10d9e875, got reasonable coverage of the migration logic in CodeMotionAnalysisTest.

sageserpent-open commented 2 months ago

Git commit SHA 14fdbdb1c3f93fb11736bd841c2dfedc10d9e875 documents the jemmying of one of the tests to acknowledge that handling of destination context around migrated insertions is pretty hokey when a preceding anchor is followed by another preceding anchor in the destination file.

I'm honestly not sure if this is something that can be worked around with a special case for two preceding anchors or is indicative of a fundamental flaw in how destination context is handled. It boils down to the question of where a migrated insertion should sit wrt its anchor(s) and any context that the destination has added in between the anchors.

I have a nasty feeling that there is no general solution to this that would be acceptable to someone reading the resulting merge output in all circumstances, other than doing some kind of after-the-fact merging of content between the anchors.

Now that I've written that previous paragraph, it seems to make more sense - and would also deal with the problem of unmatched sections of content that can appear when what would normally be an anchor in its own right doesn't match by virtue of being too small.

If we were to take this approach, that might deliver ticket #29 into the bargain. That's a big piece of work and needs some consideration first though.

Also swept under the carpet is an elegant handling of how to suppress a move destination that should be superseded by an inter-file move that is itself migrated as an insertion into the same destination file.

Now there probbaly isn't any sure-fire way of doing this that will always do the right thing, but at least detecting the presence of a migrated intra-file move in the same file as a move destination should deal with most of the obvious situations. What's done right now however is to suppress all move destinations once a single intra-file move is migrated as an insertion.

It will do for now, but needs finessing.

sageserpent-open commented 2 months ago

As I'm losing the will to live working on this ticket, this is going to be closed and further work can go into other tickets.

Ship it!

sageserpent-open commented 2 months ago

It is noteworthy that Git actually supports the simplest inter- versus intra-file divergence resolution out of the box. What happens is that Git simply pairs a rename of a file on one side with the base file, then merges in the intra-file movement on the other side as simple line changes to the renamed file.

This won't work in Git for more complex situations - eg. when the base file is split into two files on one side, and an intra-file move is made on the other side that would hop from one half of the original file to another. Unless the destination of the intra-file move ends up in a half of the split that can be matched with the original base file, Git won't merge it through.