sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
12 stars 2 forks source link

Assertion failure when merging. #91

Closed sageserpent-open closed 1 month ago

sageserpent-open commented 1 month ago

Initial description.

Seen using build of Kinetic Merge from Git commit SHA: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1.

Merging in the Kinetic Merge repository, our commit is: 5dbb4faca2a2df4351c9f4892da59601913489f3, their commit is: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1 (same as the one used to build Kinetic Merge).

Command line: kineticMerge/target/kinetic-merge 7b4fd68698f8ed6b13b067cf87f6abb25db52da1.

Outcome: java.lang.IllegalArgumentException: requirement failed

Incidentally, how about a stack trace?

sageserpent-open commented 1 month ago

Need to minimise this, or at least do some debugging to provide some context.

Also, reporting a bit more information would be welcome. Where is the stack trace when you need it...

sageserpent-open commented 1 month ago

Debugging shows a failure of this invariant in CodeMotionAnalysis.

Specifically, the first match of five associated with a section on their side is a left-right, the other four are all all-sides matches. How did that come about?

(Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 812, characterOffset = 14),
  onePastEnd = TextPosition(line = 812, characterOffset = 65),
  startTokenIndex = 3182,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),

HashSet(

LeftAndRight(Section(
  label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 782, characterOffset = 20),
  onePastEnd = TextPosition(line = 782, characterOffset = 71),
  startTokenIndex = 3757,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 812, characterOffset = 14),
  onePastEnd = TextPosition(line = 812, characterOffset = 65),
  startTokenIndex = 3182,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)), 

AllSides(Section(
  label = "BASE: 1e55332f",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 589, characterOffset = 20),
  onePastEnd = TextPosition(line = 589, characterOffset = 71),
  startTokenIndex = 2915,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 675, characterOffset = 20),
  onePastEnd = TextPosition(line = 675, characterOffset = 71),
  startTokenIndex = 3336,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 812, characterOffset = 14),
  onePastEnd = TextPosition(line = 812, characterOffset = 65),
  startTokenIndex = 3182,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)), 

AllSides(Section(
  label = "BASE: 1e55332f",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 675, characterOffset = 20),
  onePastEnd = TextPosition(line = 675, characterOffset = 71),
  startTokenIndex = 3336,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 675, characterOffset = 20),
  onePastEnd = TextPosition(line = 675, characterOffset = 71),
  startTokenIndex = 3336,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 812, characterOffset = 14),
  onePastEnd = TextPosition(line = 812, characterOffset = 65),
  startTokenIndex = 3182,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)), 

AllSides(Section(
  label = "BASE: 1e55332f",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 589, characterOffset = 20),
  onePastEnd = TextPosition(line = 589, characterOffset = 71),
  startTokenIndex = 2915,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 589, characterOffset = 20),
  onePastEnd = TextPosition(line = 589, characterOffset = 71),
  startTokenIndex = 2915,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 812, characterOffset = 14),
  onePastEnd = TextPosition(line = 812, characterOffset = 65),
  startTokenIndex = 3182,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)), 

AllSides(Section(
  label = "BASE: 1e55332f",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 675, characterOffset = 20),
  onePastEnd = TextPosition(line = 675, characterOffset = 71),
  startTokenIndex = 3336,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 589, characterOffset = 20),
  onePastEnd = TextPosition(line = 589, characterOffset = 71),
  startTokenIndex = 2915,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 812, characterOffset = 14),
  onePastEnd = TextPosition(line = 812, characterOffset = 65),
  startTokenIndex = 3182,
  sizeInTokens = 11,
  content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
))))
sageserpent-open commented 1 month ago

Annoyingly, that left-right match doesn't show up in the log! Grrr...

What does show up are what appears to be the same section:

Section(
        label = "BASE: 1e55332f",
        path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
        start = TextPosition(line = 85, characterOffset = 16),
        onePastEnd = TextPosition(line = 85, characterOffset = 63),
        startTokenIndex = 477,
        sizeInTokens = 11,
        content summary = "fansi.Color.Green ... reconstituteTextFrom(result))"
      )

repeated eight times in the interval tree on the base side. Huh?

sageserpent-open commented 1 month ago

Well, commit f9395e1b8a7467f4777e64f08dfd38eb0f8898af introduced this behaviour along with a comment stating that was the intent - this apparently tracks multiple occurrences of the same section as a result of noting several ambiguous matches that involve the section.

The comment states:

Reinstate subsumed all-sides matches eating into larger pairwise matches.

So on the face of it, this isn't invalid in its own right...

Where did that left-right match come from? Was this due to a larger left-right match being eaten into?

sageserpent-open commented 1 month ago

No - this left-right match was produced as part of the primary match generation driven by MatchesAndTheirSections.matchesForWindowSize.

The invariant mentioned above is expressed as a post-condition of CodeMotionAnalysis.of (for some nefarious reasons up to and including 3e0df31df22d29d3a3d9383afc912f7262b93807).

Moving that deeper into MatchesAndTheirSections - which also requires addressing the nefarious reasons in that implementation - results in a very slow running implementation, which still passes all of the tests in CodeMotionAnalysisTest (can't remember any failures in higher-level component tests).

However, we do see an earlier invariant failure when running the example scenario for this bug ticket. The match size is 48, and it is happening from within the execution of MatchesAndTheirSections.matchesForWindowSize.

It's not yet clear exactly what is happening, but my money is on the possibility of a pairwise match that escapes the suppression in MatchesAndTheirSections.withoutRedundantPairwiseMatchesIn because it only has one section in common with all of the other all-sides matches....

sageserpent-open commented 1 month ago

Well, after hacking the code to log details when the invariant fails, we get:

15:23:00.922 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- One side of the match LeftAndRight(
  leftElement = Section(
    label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 780, characterOffset = 15),
    onePastEnd = TextPosition(line = 784, characterOffset = 20),
    startTokenIndex = 3735,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
            println("""
  ),
  rightElement = Section(
    label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 274, characterOffset = 11),
    onePastEnd = TextPosition(line = 278, characterOffset = 16),
    startTokenIndex = 1113,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
        println("""
  )
) is involved with...
15:23:00.923 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- Nothing at all.
15:23:00.923 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- The other side of the match is involved with...
15:23:00.933 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- AllSides(
  baseElement = Section(
    label = "BASE: 1e55332f",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 187, characterOffset = 17),
    onePastEnd = TextPosition(line = 191, characterOffset = 22),
    startTokenIndex = 920,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
              println("""
  ),
  leftElement = Section(
    label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 426, characterOffset = 11),
    onePastEnd = TextPosition(line = 430, characterOffset = 16),
    startTokenIndex = 2019,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
        println("""
  ),
  rightElement = Section(
    label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 274, characterOffset = 11),
    onePastEnd = TextPosition(line = 278, characterOffset = 16),
    startTokenIndex = 1113,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
        println("""
  )
)
15:23:00.936 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- AllSides(
  baseElement = Section(
    label = "BASE: 1e55332f",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 673, characterOffset = 15),
    onePastEnd = TextPosition(line = 677, characterOffset = 20),
    startTokenIndex = 3314,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
            println("""
  ),
  leftElement = Section(
    label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 673, characterOffset = 15),
    onePastEnd = TextPosition(line = 677, characterOffset = 20),
    startTokenIndex = 3314,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
            println("""
  ),
  rightElement = Section(
    label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
    path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
    start = TextPosition(line = 274, characterOffset = 11),
    onePastEnd = TextPosition(line = 278, characterOffset = 16),
    startTokenIndex = 1113,
    sizeInTokens = 48,
    content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
        println("""
  )
)

...

So absolutely no all-sides matches had anything to do with the left section of our errant left-right match, but 64 all-sides matches all involved the right section of said left-right match!

Huh?

sageserpent-open commented 1 month ago

This code in MatchesAndTheirSections.matchFrom that handles all-sides matches that are not subsumed at all by larger all-sides matches is suspect.

It throws away the smaller all-sides match if only one of its sides is subsumed by a larger pairwise match - but it should really cut down the match to be pairwise to avoid interference. That is also what the comment states, however I don’t think it does that.

sageserpent-open commented 1 month ago

Not only is the code mentioned in the previous comment suspect, but so is the intent documented in the comment.

Paring down an all-sides match when it is subsumed by a larger all-sides match on one side only is fine, but one should not pare down an all-sides match when it is subsumed by a larger pairwise match on just one side - nor should it be suppressed entirely; it should go through.

An example of this would be when there are two ambiguous all-sides matches, one is subsumed by a larger pairwise match on both sides (so will eat into it later on, to remain intact itself). The other is subsumed by the same larger pairwise match, but only on one side.

If we imagine the situation after the larger pairwise match has been eaten into, then we have a perfectly innocuous pair of ambiguous all-sides matches, neither of which will interfere with what’s left of the pairwise match after it has been eaten into.

A trickier situation is when the first of those putative all-sides matches fails to match inside part of the larger pairwise match because of thresholding. In that case, we will not eat into the larger pairwise match and that would make the other all-sides match illegal, as it is still subsumed on one side by the larger pairwise match. In this situation, paring down to a pairwise match is the right thing to do.

sageserpent-open commented 1 month ago

Yet another approach is to revisit the two-stage scan idea - looking for all-sides matches first and then making a second pass for pairwise matches, only blocking putative larger pairwise matches when they would subsume smaller all-sides matches.

That should avoid all of the cleanup nonsense, but comes with a heavy computation cost. Searching for matches is hard enough as it is.

sageserpent-open commented 1 month ago

The story so far as of cbf94a6d070b17d66db108f123161cf86c7de0b4:

  1. Tried moving the pesky invariant into MatchesAndTheirSections, this led to commit b2bb0e6bf1a1ad7fe06f624c73ff09c7eb753c07 and some uncommitted WIP. This runs slowly, also had to fix up the implementation to support MatchesAndTheirSections having an always-on invariant - in the main line of development, it was considered OK to have mixes of match types all involving the same section as long as the dust settled by the end of CodeMotionAnalysis.of. The so-called invariant was really a post-condition.
  2. Once the (now correctly termed) invariant was settled into its new home, this revealed breakage early on while matches were being searched for - so nothing to do with the part where pairwise matches are eaten into, nor with the cleanup of extraneous pairwise matches.
  3. That in turn led to the re-examination of MatchesAndTheirSections.matchFrom, showing a possible defect in its implementation wrt to the documented semantics, and also calling the semantics themselves into question.
  4. That work up to b2bb0e6bf1a1ad7fe06f624c73ff09c7eb753c07 was put on hold (it takes a long time to keep testing the invariant every time a new instance of MatchesAndTheirSections is created). The original implementation was cutover so that the post-condition was moved into MatchesAndTheirSections again, but this time as a part-time invariant that is only expected to hold when a MatchingResult is created at each search step, and as a post-condition of CodeMotionAnalysis.of. This is a bit ropey, but gives back reasonable performance and allows early interception of the invariant breakage.
  5. MatchesAndTheirSections.matchFrom was cut over to MatchesAndTheirSections.allSidesMatchOf, reflecting the fact that it can only produce all-sides matches. Where the old code would have pared down an all-sides match to a pairwise one, the new code simply emits an all-sides match provisionally, leaving it to the post-processing phase to decide whether to leave them in as-is, pare them down or suppress them completely. The post-processing is currently not cut over to do all of this yet, though.
  6. (Aside: this means that the call to MatchesAndTheirContext.withoutRedundantPairwiseMatchesIn made from MatchesAndTheirContext.withMatches is probably not needed, although it might be useful somewhere in the post-processing.)
  7. So, at cbf94a6d070b17d66db108f123161cf86c7de0b4 CodeMotionAnalysisExtensionTest has test failures galore because the post-processing hasn't been updated to handle the new semantics of MatchesAndTheirSections.allSidesMatchOf - but going through the bug reproduction with Kinetic Merge now passes the invariant checks. Not entirely surprising given that the paring down of all-sides matches is currently absent!
  8. Kinetic Merge then faulted due to an assertion failure in MatchesAndTheirSections.eatIntoSection. Inspecting the state via debugging led me to the conclusion that the state was valid, so the assertion was loosened slightly to allow bite ends from nested sections to align; this seems to be a valid situation when a subsumed section aligns with the end of a subsuming section (as long as it doesn't also align with the start). There is a similar situation when a subsumed section aligns with the start of a subsuming section, that is already tolerated in the old code.
  9. Amending said assertion then leads to a stack overflow in the merge phase! However, going back to the main development line and disabling the original post-condition and running Kinetic Merge through the bug reproduction leads to the same fault, thus that hasn't been precipitated by work on this ticket. For the sake of keeping the context down, addressing that issue has been hived off into #93 .
sageserpent-open commented 1 month ago

So the next thing to do is to cut-over the post-processing, and hope that we don't see those assertion failures...

There is also a question as to whether there is any work done up to b2bb0e6bf1a1ad7fe06f624c73ff09c7eb753c07 that should be kept. It would be nice to have MatchesAndTheirSections have a nice, always-on invariant - or to find a proper resting place for the part-time one where it can be always-on.

sageserpent-open commented 1 month ago

After a whole lot of further pondering, this morphed into 6f41769292ed4723e6db07be40695fd4a6d35288, where larger pairwise edits are eaten into on the fly as the match search progresses, using the provisional all-sides matches to bite into them. The provisional matches are in turn pared down against the state including the effects are eating, and what's left is then added in.

This doesn't break the invariant in question, but can allow overly-eager breakdown of larger pairwise matches by all-sides matches that are then pared down (and so should have been allowed to eat in). That breaks tests!

sageserpent-open commented 1 month ago

There is a chicken-and-egg situation; on the one hand larger pairwise matches should be eaten into by smaller all-sides matches that are subsumed on two sides - and this is not an academic requirement, real code requires this. On the other hand, larger pairwise matches should partially suppress smaller all-sides matches that are subsumed on one side.

I think this is at the heart of the overall bug and the various test failures seen on the way.

sageserpent-open commented 1 month ago

OK - another plan. NO, THIS WON'T HELP WITH THE PROBLEM DESCRIBED IN THE PREVIOUS COMMENT

It's time to do the two-stage scan thing NO, IT'S NOT TIME. The twist this time is to be subtle about implementing the upwards blocking of larger pairwise matches in the second scan by smaller all-sides matches from the first scan.

If the larger pairwise match subsumes a smaller all-sides match on two sides, it is blocked. The end. So this gives us the equivalent of the breakdown by eating behaviour, because the leftovers will be found the later steps in the second scan.

If the larger pairwise match subsumes a smaller all-sides match on just one side, we allow it. Later on, any all-sides matches that are subsumed by a pairwise match (those will be the only ones that could, and it will be on one side only) are pared down to pairwise matches with a check to see whether they are redundant in the face of matches ambiguous with the all-sides being pared.

We have to be careful and not generate pared down pairwise matches in the first scan, but equally we don't let these go through as provisional all-sides matches either.

Hopefully the second scan will be a lot faster than the first as we expect fewer pairwise matches...

sageserpent-open commented 1 month ago

NOTE: when the dust settles, introduce a type alias analogous to Pairwise for all kinds of matches in CodeMotionAnalysis. I'm tired of typing Match[Section[Element]] all over place. DONE.

sageserpent-open commented 1 month ago

As of 9ee5a9309fa67087d1bda72f0cb7c2c85cc37848, there appears to be something that passes the more esoteric tests in CodeMotionAnalysisTest, but seems to show up failures in the generation of some the test cases for CodeMotionAnalysisTest.matchingSectionsAreFound.

What seems to be happening is that unanticipated short all-sides matches are creeping in that break up the longer pairwise matches that the test case itself expects to find.

This might be due to extreme shrinkage defeating the test's own logic that tries to avoid spurious small matches from interfering with the anticipated matches - in which case there might be a 'genuine' test failure earlier.

Either way, it's not great.

sageserpent-open commented 1 month ago

Gotcha! The test logic is good - in fact the whole idea is to encourage some matching to take place inside the anticipated matches, and the test expectations are written with this in mind - as long as all the content in an anticipated match is matched either via the anticipated match or a combination of smaller matches that concatenate together on each side to cover the matched content, the test will pass it.

That approach allows exercising of situations where a match is partially subsumed within another anticipated match - say a small all-sides within a larger pairwise match, or a small all-sides match being pared down by a larger all-sides match, or just being plain blocked.

The test could do with some extra commentary about this - there are comments, but with the passage of time they have become opaque to yours truly.

So, what's causing the failures?

A clue was furnished by running MainTest and CodeMotionAnalysisExtensionTest. These are also showing failures and it is obvious that content is being dropped from the matches.

Reexamination of the failures in CodeMotionAnalysisTest.matchingSectionsAreFound shows that when larger pairwise matches are eaten into, the leftovers that are too small to be picked up by the overall search for matches are simply dropped. They are reconstituted as gap-fills later, but they are gone as pairwise matches - so this leaves a gap inside the anticipated matched content that rightfully causes the test to fail.

sageserpent-open commented 1 month ago

The culprit is the guard code here.

I had imagined that it would be safe to lose small leftover pairwise matches if the search can't find them, but the tests say otherwise...

sageserpent-open commented 1 month ago

NOTE: . Assuming that we're on the right track, back-patch a test in to reproduce the original fault and verify that it passes when merged with the ongoing work.

sageserpent-open commented 1 month ago

An update as of ddfe2b0f3589a4934de06b1e73f5ec8daba5a02b:

  1. CodeMotionAnalysisTest now passes in its entirety. Huzzah!
  2. CodeMotionAnalysisExtensionTest falls over completely, as do some of the tests in MainTest. Boo!

Eating into larger pairwise sections is now iterated by MatchesAndTheirSections.stabilize so that the fragmentation of the larger pairwise matches as allowed to possibly open up other smaller all-sides matches that would have been partially subsumed by prior to the fragmentation. That allows further eating and fragmentation to take place, hence the iteration.

The pairwise match fragments are then added to the match and section tracking state, but only when the window size of the match search would allow it. Even then, they are checked for subsumption. If the fragments are smaller than the search's current window size, they are either discarded if the search could find them later on at a lesser window size, or are parked as evading fragments, to be vetted for subsumption and possibly added later on once the search allows it (or is complete).

Testing for subsumption has been moved completely out of the helper matchingFingerprintsAcrossSides - all that does now is to check for overlaps, if they are none, it simply makes matches there and then without worrying about whether an all-sides might be pared down to a pairwise etc.

Paring down is done by pareDownOrSuppressCompletely and that is called multiple times as part of stabilization; it is also used to pare down evading fragments as they are recaptured.

Subsumption checks are a lot more nuanced and take into account that multiple matches can subsume a match under consideration. Sometimes we care about whether it just some match that subsumes on some side or other, sometimes we are bothered that there is at least a single match that subsumes on more than one side.

The approach feels better than the old code, although the interaction between window sizes (owned by the search algorithms) and the evading fragments (owned by MatchesAndTheirSections) as quite nasty, as is the final consolidation of any remaining fugitive fragments after the searching is done.

So what's wrong with this?

  1. The invariant (it's a part-time one now, and it is also a bit nasty, but that wart is for later) is looking suspect. The original post-condition failure driving this ticket was I think a genuine problem, but what's been seen now a pairwise match and an all-sides match sharing just one section - essentially two ambiguous all-sides matches were one has been pared-down away from the side sharing the same section. This looks valid to me, so that invariant needs finessing (or outright removal).
  2. It seems like the higher-level tests are breaking precisely because MatchesAndTheirSections is being so conscientuous about not adding matches until their window size has been reached or passed by the overall match search. As the original pairwise matches are removed and their fragments may be at large, this allows other smaller all-sides matches to try to steal content that would be partially covered by the fragments, overlapping with the all-sides matches that caused the original fragmentation.
sageserpent-open commented 1 month ago

Part of the motivation for keeping track of match fragments evading the search was to not violate the implied invariant that searching through window sizes never encounters a MatchesAndTheirSections that contains matches of lesser window size that the one currently being considered.

That should be relaxed, in the sense that we still check for overlaps - both with larger matches (as before) and smaller or equal size match fragments (new behaviour), for subsumption by larger matches (or larger match fragments) (as before) and for subsuming smaller match fragments (new behaviour).

So fragments are allowed to block matches that are of greater or equivalent match size on the basis that they are fragments of an originally larger pairwise match.

Worth a try...

sageserpent-open commented 1 month ago

Almost there with 9f203713a0cca723af77d046aa6219c6d1c0425d - fragments are now added to the match tracking implementation as soon as they are discovered. All the evading pairwise match stuff has been stripped out.

Now we're down to one last test failure still holding out. Grrr.

sageserpent-open commented 1 month ago

That last test failure reveals yet another Achilles Heel: if a larger pairwise match is completely eaten away by several all-sides matches (this can happen if the all-sides matches abut on the same sides as the larger pairwise match, but are scattered or jumbled on the remaining side), then all trace of the original larger pairwise match is gone.

This in turn allows mischief where the way is open for further overlapping matches of the same size as the ones that ate into the pairwise match to appear during subsequent stabilization phases to claim the content that was covered by the pairwise match.

So on the one hand, we need to eat into the pairwise match, and on the other it should hang around ... or should overlapping matches of the same size be banned - perhaps when they compete under a pairwise match.

My head hurts again!

sageserpent-open commented 1 month ago

As of 9d197b7ca80b979458a607a75724f665674d2551, finally got all the tests working.

The invariant is still breaking when it is re-enabled, and from inspection it is again where a pairwise match and an all-sides match share just one section.

Not sure what's left to police in that invariant, other than not having total subsumption of a pairwise match in an all-sides match.

Also, need to write a test that captures what I think was the original problem and verify that it passes, preferably prior to cutting and reinstating the invariant.

sageserpent-open commented 1 month ago

As of 256fecea2c34d4ca3adcfed2dc3a28398dde1d1c, the invariant has been cut over to ensure that while all-sides and pairwise matches can be mixed up for the same window size, a pairwise match may not be completely subsumed by an all-sides match (i.e. is redundant). It passes.

Running the original example for this ticket now progresses through a merge, but fails with an assertion error (without showing anything useful whatsoever in the message or stacktrace).

Grrr.

So, we have to do:

  1. Some general tidying up to address earlier comments on this ticket. DONE.
  2. Add a test to show that whatever was wrong is now fixed. DONE.
  3. Investigate that assertion error and see if it is part of this ticket - and whether it is masking the other problem in #93. DONE: the problem in #93 has been ameliorated for now.
sageserpent-open commented 1 month ago

The assertion occurs after CodeMotionAnalysis.of has completed; it is in CodeMotionAnalysisExtension.merge after the per-file merges have completed.

As such, it can be picked up as a prelude to working on #90.

For now 78c6ba0276e3efe4750e353179904a06e465df93 disables that postcondition so that the third point from the previous comment can be addressed...

sageserpent-open commented 1 month ago

The original failing example of running Kinetic Merge now completes execution as 78c6ba0276e3efe4750e353179904a06e465df93 - it seems the stack overflow described in #93 has disappeared, so the new matching strategy may be protecting the subsequent merge with a more fortuitous match breakdown. Good!

I'm still inclined to pursue #93 as it seems it's lurking just beneath the water anyway, but this is heartening.

sageserpent-open commented 1 month ago

Commit 105100fc23b46930859ea8fd5951ae604efe5fdd reinstates the postcondition in CodeMotionAnalysisExtension.merge, as we know there isn't another problem lurking downwind on the branch for this ticket.

It is still possible to provoke the error for #93 using the codebase prior to work on this ticket, so nothing has been swept under the carpet.

sageserpent-open commented 1 month ago

Merged on to main in 105100fc23b46930859ea8fd5951ae604efe5fdd.

The original example using Kinetic Merge completes (postcondition in CodeMotionAnalysisExtension.merge is disabled) , but results in a merge conflict that is worthy of further investigation:

Screenshot 2024-10-18 at 08 39 36

My gut feeling is that this should be looked at in the context of #90.

The CPU and heap usage spikes alarmingly in the final phase of Kinetic Merge - namely when computing a longest common subsequence (checked by sampling). The application has on OOM when running in just 4G of heap, the screenshot below used 10G of heap:

Screenshot 2024-10-18 at 08 42 33

Screenshot 2024-10-18 at 08 45 40

That should be looked at in the context of #93.

So with that in mind, let's close this ticket!