sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
9 stars 1 forks source link

Fix poor merge due to trailing shared context being treated as an extended insertion. #42

Closed sageserpent-open closed 1 month ago

sageserpent-open commented 1 month ago

Git commit SHA: d5df1eb361b6c9a1e90fbe85d41142780e17274a bodges the test MainTest.anEditAndADeletionPropagatingThroughAFileCondensation so as to allow a closing brace in some Java code to be matched across the base and right sides of the merge.

If this isn't done, the closing brace, being a single character, is not matched - so on the right side that adds a method definition in front of it, the method definition and the closing brace become a single inserted section. The closing brace in the base is thrown away by the merge, and the inserted section is migrated to its new home - but with the extraneous closing brace.

That results in a merge that doesn't compile and looks weird.

Need to revert d5df1eb361b6c9a1e90fbe85d41142780e17274a and find a way of making the test pass as is.

sageserpent-open commented 1 month ago

A rather horrible idea which I think has already been tried is to fragment unmatched text into single token sections, so that the merge can have a crack at aligning individual tokens.

I have a feeling that if this was tried in the past, it blew the merge algorithm up due to all those tokens - in real-world examples, such unmatched code can have many tokens.

It seems as if the merge should be able to explode unmatched sections and perform a recursive match-and-merge on aligned sections to do this. If it could, that would sidestep the small ambiguous match problem nicely too.

An alternative is to do something hokey like looking for common prefixes and suffixes when unmatched sections are aligned.

One problem is what is meant by aligned? - If the sections don't match and don't have identical content, then the merge algorithm will think there is some kind of editing going on...

sageserpent-open commented 1 month ago

Remember PartitionedThreeWayTransform - perhaps that could be applied to all the unmatched sections in a last ditch attempt at finding small matches?

Either that, or reapply matching to just the unmatched sections - would need to generalise this to support non-contiguous sections…

sageserpent-open commented 1 month ago

Perhaps the merge algebra implementation should be tweaked to recognise this situation, then examining the insertion to see if it has a prefix or suffix that matches what would be thrown away from the base. That would allow the merge to rewrite sections on the fly.

That might require some fancy footwork to rebuild the breakdown of files into sections, though.

sageserpent-open commented 1 month ago

Let's examine this from the point of view of the merge algorithm, as the use of 'insertion' above is a bit sloppy - that is from the user's point of view (i.e. me casually inspecting the sources).

The log shows:

09:41:55.206 [io-compute-2] DEBUG c.s.kineticmerge.core.merge$ -- Conflict between left deletion of Section(
  label = "BASE: 812043a7",
  path = /private/var/folders/wh/xq3_z3j945q0j4___fxqb6xr0000gn/T/toyGitRepository4334175726671119423/pathPrefix1/CasesLimitStrategies.java,
  start = TextPosition(line = 115, characterOffset = 8),
  onePastEnd = TextPosition(line = 119, characterOffset = 0),
  startTokenIndex = 593,
  sizeInTokens = 3,
  content = """};
    }

"""
) and its edit on the right into Section(
  label = "THEIRS: master",
  path = /private/var/folders/wh/xq3_z3j945q0j4___fxqb6xr0000gn/T/toyGitRepository4334175726671119423/pathPrefix1/CasesLimitStrategies.java,
  start = TextPosition(line = 117, characterOffset = 12),
  onePastEnd = TextPosition(line = 123, characterOffset = 2),
  startTokenIndex = 595,
  sizeInTokens = 18,
  content summary = """@Override
            public boolean legacyMethod ... }
        };
    }

}
"""
).
09:41:55.207 [io-compute-2] DEBUG c.s.k.c.MatchesContext$MergeResultDetectingMotion$ -- Conflict at origin of move: resolved as a coincident deletion of Section(
  label = "BASE: 812043a7",
  path = /private/var/folders/wh/xq3_z3j945q0j4___fxqb6xr0000gn/T/toyGitRepository4334175726671119423/pathPrefix1/CasesLimitStrategies.java,
  start = TextPosition(line = 115, characterOffset = 8),
  onePastEnd = TextPosition(line = 119, characterOffset = 0),
  startTokenIndex = 593,
  sizeInTokens = 3,
  content = """};
    }

"""
); propagating right edit Section(
  label = "THEIRS: master",
  path = /private/var/folders/wh/xq3_z3j945q0j4___fxqb6xr0000gn/T/toyGitRepository4334175726671119423/pathPrefix1/CasesLimitStrategies.java,
  start = TextPosition(line = 117, characterOffset = 12),
  onePastEnd = TextPosition(line = 123, characterOffset = 2),
  startTokenIndex = 595,
  sizeInTokens = 18,
  content summary = """@Override
            public boolean legacyMethod ... }
        };
    }

}
"""
) to left move destination Section(
  label = "OURS: condensedFilesBranch",
  path = /private/var/folders/wh/xq3_z3j945q0j4___fxqb6xr0000gn/T/toyGitRepository4334175726671119423/pathPrefix1/pathPrefix2/CasesLimitStrategy.java,
  start = TextPosition(line = 126, characterOffset = 8),
  onePastEnd = TextPosition(line = 129, characterOffset = 4),
  startTokenIndex = 689,
  sizeInTokens = 3,
  content = """};
    }

    """
).
09:41:55.208 [io-compute-2] DEBUG c.s.kineticmerge.core.merge$ -- Coincident deletion of Section(
  label = "BASE: 812043a7",
  path = /private/var/folders/wh/xq3_z3j945q0j4___fxqb6xr0000gn/T/toyGitRepository4334175726671119423/pathPrefix1/CasesLimitStrategies.java,
  start = TextPosition(line = 119, characterOffset = 0),
  onePastEnd = TextPosition(line = 119, characterOffset = 2),
  startTokenIndex = 596,
  sizeInTokens = 1,
  content = """}
"""
).
sageserpent-open commented 1 month ago

The code motion is actually via an edit - so

"""};
    }

"""

is edited on the right into

"""@Override
            public boolean legacyMethod ... }
        };
    }

}
"""

Observe the additional trailing closing brace.

Following that, the merge sees a coincident deletion of

"""}
"""

That's our closing brace again.

What should have happened in a perfect world would be for the closing brace to be matched on the base and right, in which case it wouldn't be part of the edit text and thus would not be propagated - in fact it would be considered a left deletion.

sageserpent-open commented 1 month ago

Here's a rather hokey way of fixing this:

  1. We accept the current approach of finding matches down to to some threshold and don't try matching single tokens (so the closing brace in this example) because that just creates lots of useless ambiguous matches, hiding the one we really care about.
  2. The merge algorithm knows when a coincident deletion follows an edit and also knows when an edit follows a coincident deletion. Actually, the edit is part of a delete versus edit conflict.
  3. What we want is to delete the trailing part of the edit (or the leading part for that matter) if it matches what is coincidentally deleted - this is a kind of 'last minute match', rather like what is done the merge aligns sections that are not matched, but have identical content. So we can examine the content that is coincidentally deleted and see if it is a suffix (or prefix) of the edit.
  4. The edit section is then adjusted so that it's content yields only the text before the suffix (or after the prefix). We don't adjust startOffset, onePastEndOffset or size though - fortunately there is no explicit connection between the size of a section and its actual content in Section. We don't worry about breaking invariants either because it is the sections belonging to a File that are expected to be contiguous, having the size computed from the content - the adjusted section simply makes it may through change propagation. The resulting merge isn't bothered about contiguity of sections at all - it expects them to be a mix from the left and right sides, and for code motion to happen.
sageserpent-open commented 1 month ago

In fact the merge algebra in MatchesContext can simply track whether a coincident deletion precedes or follows a deletion versus edit conflict, there should be no need to modify the core merge algorithm.

sageserpent-open commented 1 month ago

As of 99fc253f7561a0f585c739f6940cb3f3f06fbad4, there is a rather hacked solution that results in a correct merge for the suffix situation. This is also tested at unit level in MatchesContextTest.

The next thing to do is to add tests for the prefix situation and to clean up the code...

sageserpent-open commented 1 month ago

Adding CodeMotionAnalysisExtensionTest.codeMotionWithEditAdjustment to test this approach reveals a test failure in eeb74ffac67e7bcaa1ebc1b0e9cee6e0458836a3 that was swept under the rug by 55b3654936911cbbc1937be144120460d065b2fb.

The problematic merge inputs are:

BASE: FishChipsMushyPeasKetchupMuesliToastTeaKippers}NoodlesSandwichCakePudding LEFT: MuesliToastTeaKippersFishChipsMushyPeasKetchupNoodlesSandwichCakePudding RIGHT: FishChipsMushyPeasKetchupPorridge}CakePudding EXPECTED: PorridgeFishChipsMushyPeasKetchupCakePudding

Each food article is a token, as is }.

sageserpent-open commented 1 month ago

The merge log:

10:48:58.005 [main] DEBUG c.s.kineticmerge.core.merge$ -- Left insertion of Section(
  label = "left",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 0),
  onePastEnd = TextPosition(line = 1, characterOffset = 21),
  startTokenIndex = 0,
  sizeInTokens = 4,
  content = "MuesliToastTeaKippers"
).
10:48:58.010 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of Section(
  label = "left",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 21),
  onePastEnd = TextPosition(line = 1, characterOffset = 46),
  startTokenIndex = 4,
  sizeInTokens = 4,
  content = "FishChipsMushyPeasKetchup"
) as it is common to all three sides.
10:48:58.012 [main] DEBUG c.s.kineticmerge.core.merge$ -- Coincident deletion of Section(
  label = "base",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 25),
  onePastEnd = TextPosition(line = 1, characterOffset = 46),
  startTokenIndex = 4,
  sizeInTokens = 4,
  content = "MuesliToastTeaKippers"
) with following right edit.
10:48:58.013 [main] DEBUG c.s.k.c.MatchesContext$MergeResultDetectingMotion$ -- Coincident deletion at origin of move: propagating deletion to left move destination Section(
  label = "left",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 0),
  onePastEnd = TextPosition(line = 1, characterOffset = 21),
  startTokenIndex = 0,
  sizeInTokens = 4,
  content = "MuesliToastTeaKippers"
).
10:48:58.015 [main] DEBUG c.s.kineticmerge.core.merge$ -- Coincident deletion of Section(
  label = "base",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 46),
  onePastEnd = TextPosition(line = 1, characterOffset = 47),
  startTokenIndex = 8,
  sizeInTokens = 1,
  content = "}"
) with following right edit.
10:48:58.017 [main] DEBUG c.s.kineticmerge.core.merge$ -- Right edit of Section(
  label = "base",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 47),
  onePastEnd = TextPosition(line = 1, characterOffset = 62),
  startTokenIndex = 9,
  sizeInTokens = 2,
  content = "NoodlesSandwich"
) into Section(
  label = "right",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 25),
  onePastEnd = TextPosition(line = 1, characterOffset = 34),
  startTokenIndex = 4,
  sizeInTokens = 2,
  content = "Porridge}"
).
10:48:58.020 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of Section(
  label = "left",
  path = "*** STUNT DOUBLE ***",
  start = TextPosition(line = 1, characterOffset = 61),
  onePastEnd = TextPosition(line = 1, characterOffset = 72),
  startTokenIndex = 10,
  sizeInTokens = 2,
  content = "CakePudding"
) as it is common to all three sides.
sageserpent-open commented 1 month ago

The twist here is the competition for the right-edit between what could be a delete versus right-edit conflict of MuesliToastTeaKippers into Porridge} and a straight right-edit of NoodlesSandwich into Porridge}.

The merge algorithm favours straight edits over conflicts, so MuesliToastTeaKippers is considered to have been coincidentally deleted, followed by a coincident deletion of }, then the right-edit of NoodlesSandwich into Porridge}. The coincident deletion of MuesliToastTeaKippers is resolved as propagated of a deletion to the move destination, so the final merge looks like:

FishChipsMushyPeasKetchupPorridge}CakePudding.

So the claim of the right-edit stopped } from being recognised as a suffix of the edit.

Even if that } would have been connected with the second coincident deletion, adjusting the right-edit, the right edit is in the wrong place.

Intuitively, we would expect a match on the } - with this match, we would have a left-deletion of the } which puts Porridge in front of the }, so it has be interpreted as part of a delete versus right-edit conflict, and would thus be propagated. That would then apply a right-deletion to NoodlesSandwich, leading to the expected balanced diet.

sageserpent-open commented 1 month ago

Given that the existing code is pretty hokey, I haven't yet written the logic to handle the suffix situation, and the problem with following straight edits, I'm inclined to deem this experiment a failure.

sageserpent-open commented 1 month ago

Suppose unmatched text is just exploded into one-token sections...

sageserpent-open commented 1 month ago

Trying that out in f7fb48465518ab05eb68924c0f7e669be997db07 breaks several tests - the existing edit propagation and insertion migration logic expects there to be just one section to shift to its new home, so this won't work without cutting over.

However ... CodeMotionAnalysisExtensionTest.issue42BugReproduction passes!

Let's plod on...

sageserpent-open commented 1 month ago

As of 02d08597bcfe6c949afb1060caa4a0f7d54f24c1, now down to one test failure in CodeMotionAnalysisExtensionTest - this one is caused by not having cut over the migration of insertions to cope with multiple inserted sections yet.

There are two outstanding failures in MainTest, these may also involve edit propagation, not yet sure.

Let's sort out insertion migration and re-check...

sageserpent-open commented 1 month ago

Musing ... splitting up the gaps between matched sections affords the merge algorithm maximum flexibility in finding last-minute matches between tokens, but it can generate a lot of sections. That might cause the merge algorithm to fall over; so far this hasn't been observed, it seems that one span of unmatched text is always much smaller than the other, so the merge algorithm doesn't tank, although possibly it could be optimised for this.

What could be done would be to look for prefix and suffix matches upfront in MappedContentSources.filesByPathUtilising, breaking up the gap-filler sections so that shared prefixes and suffixes are hived off into their own matched sections. That may well generate several ambiguous matches, so there might be merit in disabling the code-motion machinery for such matches - all they do is allow the merge algorithm to align prefixes and / or suffixes, and by being matches in their own right, this would allow the existing code-motion framework to carry on propagating / migrating individual edit or insertion sections.

A variation on this is to not consider the prefix / suffix sections as being matched; this still allows the merge machinery to align them. As the rest of the gap-filler section just makes a single token, we can again stick with the existing code-motion framework to do its thing.

The question is, how does this work with migrating insertions and anchoring? We could end up splitting a gap-filler section with a single anchor into several pieces, with more then one piece viable for migration. The problem is, we've lost the connection to the anchor. This can happen even when there is a preceding and succeeding anchor, if the gap-filler is split into 5 sections or more.

Drat.

sageserpent-open commented 1 month ago

Current thoughts:

a) Splitting up gap-fills into lots of one-token sections provides maximal opportunity to align tokens via the merge algorithm - and that is crucial to this ticket. b) Doing this before the merge allows the merge to make more sensible decisions. c) So far - but this needs to be revisted with manual testing - feeding the merge with lots of one-token sections isn't killing performance. d) It looks like the treatment of edits is more or less in the bag, might be few final snags to sort out, but hopefully nothing major. e) The only sticking point is how to cope with runs of several insertions when migrating code insertions. Perhaps the way to go is to flip the way the migration logic works - instead of starting with an insertion and finding its anchors (thus assuming that insertions that need migration are isolated from each other and thus part of distinct migrations), start with the anchors, go find any abutting insertions and link them in turn with other abutting insertions to build up a map of anchors to runs of contiguous insertions that need migrating. Or just work on all the insertions associated with a path in one go, finding the anchors and runs at the same time...

sageserpent-open commented 1 month ago

NOTE: CodeMotionAnalysisExtensionTest.merge is subtly broken - visual inspection of the merge shows the desirable output of:

    packageExecutable := {
      val packagingVersion = (ThisBuild / version).value

      println(s"Packaging executable with version: $packagingVersion")

is clobbered:

    packageExecutable := {
      val packagingVersion println(s"Packaging executable with version: $packagingVersion")

Need to come back to this, commit: f7fb48465518ab05eb68924c0f7e669be997db07 broke this.

sageserpent-open commented 1 month ago

Commit: c3b5c35a651398c251a767913304faff2e6f032a brings in support for migrating runs of contiguous insertions to the same move destination. All the tests in CodeMotionAnalysisExtensionTest now pass, although there is still that subtle breakage seen by manual inspection of the merge output mentioned in the comment above to worry about...

MainTest is down to a single failure in MainTest.twoFilesSwappingAroundWithModificationsToBoth. The test MainTest.anEditAndADeletionPropagatingThroughAFileCondensation now passes, that was the one that required insertion propagation, so that's good.

Hopefully the outstanding failure in MainTest is the same as the one in CodeMotionAnalysisExtensionTest.merge...

sageserpent-open commented 1 month ago

Also need to tidy up what I did in commit 864fc64ee444fae89a7132c1d4b2b53d142fc0c3 and merge the tidyup in...

sageserpent-open commented 1 month ago

Reviewing the logs of the MainTest.twoFilesSwappingAroundWithModificationsToBoth revealed that there was a latent bug in MatchesContext - when a conflict is was resolved in favour of a straight insertion with a propagated change, the insertion was not examined as a potential move destination.

Work done in d3569853e97168127a1b834070718bcbd0606f57 and fe944ce84294af5d4d8182e3ed265918793ffff2 addresses that, and merging it in via b0ff57abf492bbff6adf0e5d2c047c360672b42a fixes all the failures.

Great!

sageserpent-open commented 1 month ago

Still got the subtle merge breakage in CodeMotionAnalysisExtensionTest.merge, though. One crumb of comfort is that it only manifests when the minimum match size goes below 6 - otherwise the merge looks good.

Think I'll take it as it is for now, so on to the tidying up...

sageserpent-open commented 1 month ago

Tidying up done, merged to main in f3b353220f1fad04064b4dc99b2d16210e72b71e.

sageserpent-open commented 1 month ago

Don't forget to amend MatchesContextTest to assert on the insertions detected during a merge...

sageserpent-open commented 1 month ago

Oh drat - doing the manual test discussed here: https://github.com/sageserpent-open/kineticMerge/issues/30#issuecomment-2106144845 causes the merge to overflow the stack.

Now, does this mean the merge should be rewritten to use continuations / trampolining, or turned into a proper dynamic programming algorithm, working bottom-up, or should the gap filling process be tuned to make fewer sections?

The prefix / suffix matching idea might be useful here...

sageserpent-open commented 1 month ago

Got a clumsy workaround for the stack overflow in 6abf07fa5293d2f21e524c1ad350904b7e653b3e, this will do as a proof-of-concept for now.

Looking at the results of the manual merge from #30 etc makes me wonder whether it is better to be more cautious about splitting up the gaps; again the prefix/suffix matching comes to mind.

Probably time for another ticket for that one - have raised #43.

sageserpent-open commented 1 month ago

Noticed that MatchesContext wasn't recording all left- and right-insertions, again those insertions caused by resolution of an edit versus delete or edit versus edit conflict were only partially updating the MergeResultDetectingMotion. Tests were written, and made to pass in: aedc282e5d802f3082dd2b605694bfc81378ce6f, which main now references.

sageserpent-open commented 1 month ago

This went out in release 1.2.0, Git commit SHA: 4d1935beef2b5923417904895884de8c2e749c6e.