Closed sageserpent-open closed 1 month ago
Need to minimise this, or at least do some debugging to provide some context.
Also, reporting a bit more information would be welcome. Where is the stack trace when you need it...
Debugging shows a failure of this invariant in CodeMotionAnalysis
.
Specifically, the first match of five associated with a section on their side is a left-right, the other four are all all-sides matches. How did that come about?
(Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 812, characterOffset = 14),
onePastEnd = TextPosition(line = 812, characterOffset = 65),
startTokenIndex = 3182,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),
HashSet(
LeftAndRight(Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 782, characterOffset = 20),
onePastEnd = TextPosition(line = 782, characterOffset = 71),
startTokenIndex = 3757,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 812, characterOffset = 14),
onePastEnd = TextPosition(line = 812, characterOffset = 65),
startTokenIndex = 3182,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)),
AllSides(Section(
label = "BASE: 1e55332f",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 589, characterOffset = 20),
onePastEnd = TextPosition(line = 589, characterOffset = 71),
startTokenIndex = 2915,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 675, characterOffset = 20),
onePastEnd = TextPosition(line = 675, characterOffset = 71),
startTokenIndex = 3336,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 812, characterOffset = 14),
onePastEnd = TextPosition(line = 812, characterOffset = 65),
startTokenIndex = 3182,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)),
AllSides(Section(
label = "BASE: 1e55332f",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 675, characterOffset = 20),
onePastEnd = TextPosition(line = 675, characterOffset = 71),
startTokenIndex = 3336,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 675, characterOffset = 20),
onePastEnd = TextPosition(line = 675, characterOffset = 71),
startTokenIndex = 3336,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 812, characterOffset = 14),
onePastEnd = TextPosition(line = 812, characterOffset = 65),
startTokenIndex = 3182,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)),
AllSides(Section(
label = "BASE: 1e55332f",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 589, characterOffset = 20),
onePastEnd = TextPosition(line = 589, characterOffset = 71),
startTokenIndex = 2915,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 589, characterOffset = 20),
onePastEnd = TextPosition(line = 589, characterOffset = 71),
startTokenIndex = 2915,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 812, characterOffset = 14),
onePastEnd = TextPosition(line = 812, characterOffset = 65),
startTokenIndex = 3182,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
)),
AllSides(Section(
label = "BASE: 1e55332f",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 675, characterOffset = 20),
onePastEnd = TextPosition(line = 675, characterOffset = 71),
startTokenIndex = 3336,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 589, characterOffset = 20),
onePastEnd = TextPosition(line = 589, characterOffset = 71),
startTokenIndex = 2915,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
),Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 812, characterOffset = 14),
onePastEnd = TextPosition(line = 812, characterOffset = 65),
startTokenIndex = 3182,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(leftResult))"
))))
Annoyingly, that left-right match doesn't show up in the log! Grrr...
What does show up are what appears to be the same section:
Section(
label = "BASE: 1e55332f",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 85, characterOffset = 16),
onePastEnd = TextPosition(line = 85, characterOffset = 63),
startTokenIndex = 477,
sizeInTokens = 11,
content summary = "fansi.Color.Green ... reconstituteTextFrom(result))"
)
repeated eight times in the interval tree on the base side. Huh?
Well, commit f9395e1b8a7467f4777e64f08dfd38eb0f8898af introduced this behaviour along with a comment stating that was the intent - this apparently tracks multiple occurrences of the same section as a result of noting several ambiguous matches that involve the section.
The comment states:
Reinstate subsumed all-sides matches eating into larger pairwise matches.
So on the face of it, this isn't invalid in its own right...
Where did that left-right match come from? Was this due to a larger left-right match being eaten into?
No - this left-right match was produced as part of the primary match generation driven by MatchesAndTheirSections.matchesForWindowSize
.
The invariant mentioned above is expressed as a post-condition of CodeMotionAnalysis.of
(for some nefarious reasons up to and including 3e0df31df22d29d3a3d9383afc912f7262b93807).
Moving that deeper into MatchesAndTheirSections
- which also requires addressing the nefarious reasons in that implementation - results in a very slow running implementation, which still passes all of the tests in CodeMotionAnalysisTest
(can't remember any failures in higher-level component tests).
However, we do see an earlier invariant failure when running the example scenario for this bug ticket. The match size is 48, and it is happening from within the execution of MatchesAndTheirSections.matchesForWindowSize
.
It's not yet clear exactly what is happening, but my money is on the possibility of a pairwise match that escapes the suppression in MatchesAndTheirSections.withoutRedundantPairwiseMatchesIn
because it only has one section in common with all of the other all-sides matches....
Well, after hacking the code to log details when the invariant fails, we get:
15:23:00.922 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- One side of the match LeftAndRight(
leftElement = Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 780, characterOffset = 15),
onePastEnd = TextPosition(line = 784, characterOffset = 20),
startTokenIndex = 3735,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
),
rightElement = Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 274, characterOffset = 11),
onePastEnd = TextPosition(line = 278, characterOffset = 16),
startTokenIndex = 1113,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
)
) is involved with...
15:23:00.923 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- Nothing at all.
15:23:00.923 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- The other side of the match is involved with...
15:23:00.933 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- AllSides(
baseElement = Section(
label = "BASE: 1e55332f",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 187, characterOffset = 17),
onePastEnd = TextPosition(line = 191, characterOffset = 22),
startTokenIndex = 920,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
),
leftElement = Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 426, characterOffset = 11),
onePastEnd = TextPosition(line = 430, characterOffset = 16),
startTokenIndex = 2019,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
),
rightElement = Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 274, characterOffset = 11),
onePastEnd = TextPosition(line = 278, characterOffset = 16),
startTokenIndex = 1113,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
)
)
15:23:00.936 [io-compute-8] ERROR c.s.k.core.CodeMotionAnalysis$ -- AllSides(
baseElement = Section(
label = "BASE: 1e55332f",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 673, characterOffset = 15),
onePastEnd = TextPosition(line = 677, characterOffset = 20),
startTokenIndex = 3314,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
),
leftElement = Section(
label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 673, characterOffset = 15),
onePastEnd = TextPosition(line = 677, characterOffset = 20),
startTokenIndex = 3314,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
),
rightElement = Section(
label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
start = TextPosition(line = 274, characterOffset = 11),
onePastEnd = TextPosition(line = 278, characterOffset = 16),
startTokenIndex = 1113,
sizeInTokens = 48,
content summary = """MergedWithConflicts(leftResult, rightResult ... "Right result..."))
println("""
)
)
...
So absolutely no all-sides matches had anything to do with the left section of our errant left-right match, but 64 all-sides matches all involved the right section of said left-right match!
Huh?
This code in MatchesAndTheirSections.matchFrom
that handles all-sides matches that are not subsumed at all by larger all-sides matches is suspect.
It throws away the smaller all-sides match if only one of its sides is subsumed by a larger pairwise match - but it should really cut down the match to be pairwise to avoid interference. That is also what the comment states, however I don’t think it does that.
Not only is the code mentioned in the previous comment suspect, but so is the intent documented in the comment.
Paring down an all-sides match when it is subsumed by a larger all-sides match on one side only is fine, but one should not pare down an all-sides match when it is subsumed by a larger pairwise match on just one side - nor should it be suppressed entirely; it should go through.
An example of this would be when there are two ambiguous all-sides matches, one is subsumed by a larger pairwise match on both sides (so will eat into it later on, to remain intact itself). The other is subsumed by the same larger pairwise match, but only on one side.
If we imagine the situation after the larger pairwise match has been eaten into, then we have a perfectly innocuous pair of ambiguous all-sides matches, neither of which will interfere with what’s left of the pairwise match after it has been eaten into.
A trickier situation is when the first of those putative all-sides matches fails to match inside part of the larger pairwise match because of thresholding. In that case, we will not eat into the larger pairwise match and that would make the other all-sides match illegal, as it is still subsumed on one side by the larger pairwise match. In this situation, paring down to a pairwise match is the right thing to do.
Yet another approach is to revisit the two-stage scan idea - looking for all-sides matches first and then making a second pass for pairwise matches, only blocking putative larger pairwise matches when they would subsume smaller all-sides matches.
That should avoid all of the cleanup nonsense, but comes with a heavy computation cost. Searching for matches is hard enough as it is.
The story so far as of cbf94a6d070b17d66db108f123161cf86c7de0b4:
MatchesAndTheirSections
, this led to commit b2bb0e6bf1a1ad7fe06f624c73ff09c7eb753c07 and some uncommitted WIP. This runs slowly, also had to fix up the implementation to support MatchesAndTheirSections
having an always-on invariant - in the main line of development, it was considered OK to have mixes of match types all involving the same section as long as the dust settled by the end of CodeMotionAnalysis.of
. The so-called invariant was really a post-condition.MatchesAndTheirSections.matchFrom
, showing a possible defect in its implementation wrt to the documented semantics, and also calling the semantics themselves into question.MatchesAndTheirSections
is created). The original implementation was cutover so that the post-condition was moved into MatchesAndTheirSections
again, but this time as a part-time invariant that is only expected to hold when a MatchingResult
is created at each search step, and as a post-condition of CodeMotionAnalysis.of
. This is a bit ropey, but gives back reasonable performance and allows early interception of the invariant breakage.MatchesAndTheirSections.matchFrom
was cut over to MatchesAndTheirSections.allSidesMatchOf
, reflecting the fact that it can only produce all-sides matches. Where the old code would have pared down an all-sides match to a pairwise one, the new code simply emits an all-sides match provisionally, leaving it to the post-processing phase to decide whether to leave them in as-is, pare them down or suppress them completely. The post-processing is currently not cut over to do all of this yet, though.MatchesAndTheirContext.withoutRedundantPairwiseMatchesIn
made from MatchesAndTheirContext.withMatches
is probably not needed, although it might be useful somewhere in the post-processing.)CodeMotionAnalysisExtensionTest
has test failures galore because the post-processing hasn't been updated to handle the new semantics of MatchesAndTheirSections.allSidesMatchOf
- but going through the bug reproduction with Kinetic Merge now passes the invariant checks. Not entirely surprising given that the paring down of all-sides matches is currently absent!MatchesAndTheirSections.eatIntoSection
. Inspecting the state via debugging led me to the conclusion that the state was valid, so the assertion was loosened slightly to allow bite ends from nested sections to align; this seems to be a valid situation when a subsumed section aligns with the end of a subsuming section (as long as it doesn't also align with the start). There is a similar situation when a subsumed section aligns with the start of a subsuming section, that is already tolerated in the old code.So the next thing to do is to cut-over the post-processing, and hope that we don't see those assertion failures...
There is also a question as to whether there is any work done up to b2bb0e6bf1a1ad7fe06f624c73ff09c7eb753c07 that should be kept. It would be nice to have MatchesAndTheirSections
have a nice, always-on invariant - or to find a proper resting place for the part-time one where it can be always-on.
After a whole lot of further pondering, this morphed into 6f41769292ed4723e6db07be40695fd4a6d35288, where larger pairwise edits are eaten into on the fly as the match search progresses, using the provisional all-sides matches to bite into them. The provisional matches are in turn pared down against the state including the effects are eating, and what's left is then added in.
This doesn't break the invariant in question, but can allow overly-eager breakdown of larger pairwise matches by all-sides matches that are then pared down (and so should have been allowed to eat in). That breaks tests!
There is a chicken-and-egg situation; on the one hand larger pairwise matches should be eaten into by smaller all-sides matches that are subsumed on two sides - and this is not an academic requirement, real code requires this. On the other hand, larger pairwise matches should partially suppress smaller all-sides matches that are subsumed on one side.
I think this is at the heart of the overall bug and the various test failures seen on the way.
OK - another plan. NO, THIS WON'T HELP WITH THE PROBLEM DESCRIBED IN THE PREVIOUS COMMENT
It's time to do the two-stage scan thing NO, IT'S NOT TIME. The twist this time is to be subtle about implementing the upwards blocking of larger pairwise matches in the second scan by smaller all-sides matches from the first scan.
If the larger pairwise match subsumes a smaller all-sides match on two sides, it is blocked. The end. So this gives us the equivalent of the breakdown by eating behaviour, because the leftovers will be found the later steps in the second scan.
If the larger pairwise match subsumes a smaller all-sides match on just one side, we allow it. Later on, any all-sides matches that are subsumed by a pairwise match (those will be the only ones that could, and it will be on one side only) are pared down to pairwise matches with a check to see whether they are redundant in the face of matches ambiguous with the all-sides being pared.
We have to be careful and not generate pared down pairwise matches in the first scan, but equally we don't let these go through as provisional all-sides matches either.
Hopefully the second scan will be a lot faster than the first as we expect fewer pairwise matches...
NOTE: when the dust settles, introduce a type alias analogous to DONE.Pairwise
for all kinds of matches in CodeMotionAnalysis
. I'm tired of typing Match[Section[Element]]
all over place.
As of 9ee5a9309fa67087d1bda72f0cb7c2c85cc37848, there appears to be something that passes the more esoteric tests in CodeMotionAnalysisTest
, but seems to show up failures in the generation of some the test cases for CodeMotionAnalysisTest.matchingSectionsAreFound
.
What seems to be happening is that unanticipated short all-sides matches are creeping in that break up the longer pairwise matches that the test case itself expects to find.
This might be due to extreme shrinkage defeating the test's own logic that tries to avoid spurious small matches from interfering with the anticipated matches - in which case there might be a 'genuine' test failure earlier.
Either way, it's not great.
Gotcha! The test logic is good - in fact the whole idea is to encourage some matching to take place inside the anticipated matches, and the test expectations are written with this in mind - as long as all the content in an anticipated match is matched either via the anticipated match or a combination of smaller matches that concatenate together on each side to cover the matched content, the test will pass it.
That approach allows exercising of situations where a match is partially subsumed within another anticipated match - say a small all-sides within a larger pairwise match, or a small all-sides match being pared down by a larger all-sides match, or just being plain blocked.
The test could do with some extra commentary about this - there are comments, but with the passage of time they have become opaque to yours truly.
So, what's causing the failures?
A clue was furnished by running MainTest
and CodeMotionAnalysisExtensionTest
. These are also showing failures and it is obvious that content is being dropped from the matches.
Reexamination of the failures in CodeMotionAnalysisTest.matchingSectionsAreFound
shows that when larger pairwise matches are eaten into, the leftovers that are too small to be picked up by the overall search for matches are simply dropped. They are reconstituted as gap-fills later, but they are gone as pairwise matches - so this leaves a gap inside the anticipated matched content that rightfully causes the test to fail.
The culprit is the guard code here.
I had imagined that it would be safe to lose small leftover pairwise matches if the search can't find them, but the tests say otherwise...
NOTE: . Assuming that we're on the right track, back-patch a test in to reproduce the original fault and verify that it passes when merged with the ongoing work.
An update as of ddfe2b0f3589a4934de06b1e73f5ec8daba5a02b:
CodeMotionAnalysisTest
now passes in its entirety. Huzzah!CodeMotionAnalysisExtensionTest
falls over completely, as do some of the tests in MainTest
. Boo!Eating into larger pairwise sections is now iterated by MatchesAndTheirSections.stabilize
so that the fragmentation of the larger pairwise matches as allowed to possibly open up other smaller all-sides matches that would have been partially subsumed by prior to the fragmentation. That allows further eating and fragmentation to take place, hence the iteration.
The pairwise match fragments are then added to the match and section tracking state, but only when the window size of the match search would allow it. Even then, they are checked for subsumption. If the fragments are smaller than the search's current window size, they are either discarded if the search could find them later on at a lesser window size, or are parked as evading fragments, to be vetted for subsumption and possibly added later on once the search allows it (or is complete).
Testing for subsumption has been moved completely out of the helper matchingFingerprintsAcrossSides
- all that does now is to check for overlaps, if they are none, it simply makes matches there and then without worrying about whether an all-sides might be pared down to a pairwise etc.
Paring down is done by pareDownOrSuppressCompletely
and that is called multiple times as part of stabilization; it is also used to pare down evading fragments as they are recaptured.
Subsumption checks are a lot more nuanced and take into account that multiple matches can subsume a match under consideration. Sometimes we care about whether it just some match that subsumes on some side or other, sometimes we are bothered that there is at least a single match that subsumes on more than one side.
The approach feels better than the old code, although the interaction between window sizes (owned by the search algorithms) and the evading fragments (owned by MatchesAndTheirSections
) as quite nasty, as is the final consolidation of any remaining fugitive fragments after the searching is done.
So what's wrong with this?
MatchesAndTheirSections
is being so conscientuous about not adding matches until their window size has been reached or passed by the overall match search. As the original pairwise matches are removed and their fragments may be at large, this allows other smaller all-sides matches to try to steal content that would be partially covered by the fragments, overlapping with the all-sides matches that caused the original fragmentation.Part of the motivation for keeping track of match fragments evading the search was to not violate the implied invariant that searching through window sizes never encounters a MatchesAndTheirSections
that contains matches of lesser window size that the one currently being considered.
That should be relaxed, in the sense that we still check for overlaps - both with larger matches (as before) and smaller or equal size match fragments (new behaviour), for subsumption by larger matches (or larger match fragments) (as before) and for subsuming smaller match fragments (new behaviour).
So fragments are allowed to block matches that are of greater or equivalent match size on the basis that they are fragments of an originally larger pairwise match.
Worth a try...
Almost there with 9f203713a0cca723af77d046aa6219c6d1c0425d - fragments are now added to the match tracking implementation as soon as they are discovered. All the evading pairwise match stuff has been stripped out.
Now we're down to one last test failure still holding out. Grrr.
That last test failure reveals yet another Achilles Heel: if a larger pairwise match is completely eaten away by several all-sides matches (this can happen if the all-sides matches abut on the same sides as the larger pairwise match, but are scattered or jumbled on the remaining side), then all trace of the original larger pairwise match is gone.
This in turn allows mischief where the way is open for further overlapping matches of the same size as the ones that ate into the pairwise match to appear during subsequent stabilization phases to claim the content that was covered by the pairwise match.
So on the one hand, we need to eat into the pairwise match, and on the other it should hang around ... or should overlapping matches of the same size be banned - perhaps when they compete under a pairwise match.
My head hurts again!
As of 9d197b7ca80b979458a607a75724f665674d2551, finally got all the tests working.
The invariant is still breaking when it is re-enabled, and from inspection it is again where a pairwise match and an all-sides match share just one section.
Not sure what's left to police in that invariant, other than not having total subsumption of a pairwise match in an all-sides match.
Also, need to write a test that captures what I think was the original problem and verify that it passes, preferably prior to cutting and reinstating the invariant.
As of 256fecea2c34d4ca3adcfed2dc3a28398dde1d1c, the invariant has been cut over to ensure that while all-sides and pairwise matches can be mixed up for the same window size, a pairwise match may not be completely subsumed by an all-sides match (i.e. is redundant). It passes.
Running the original example for this ticket now progresses through a merge, but fails with an assertion error (without showing anything useful whatsoever in the message or stacktrace).
Grrr.
So, we have to do:
The assertion occurs after CodeMotionAnalysis.of
has completed; it is in CodeMotionAnalysisExtension.merge
after the per-file merges have completed.
As such, it can be picked up as a prelude to working on #90.
For now 78c6ba0276e3efe4750e353179904a06e465df93 disables that postcondition so that the third point from the previous comment can be addressed...
The original failing example of running Kinetic Merge now completes execution as 78c6ba0276e3efe4750e353179904a06e465df93 - it seems the stack overflow described in #93 has disappeared, so the new matching strategy may be protecting the subsequent merge with a more fortuitous match breakdown. Good!
I'm still inclined to pursue #93 as it seems it's lurking just beneath the water anyway, but this is heartening.
Commit 105100fc23b46930859ea8fd5951ae604efe5fdd reinstates the postcondition in CodeMotionAnalysisExtension.merge
, as we know there isn't another problem lurking downwind on the branch for this ticket.
It is still possible to provoke the error for #93 using the codebase prior to work on this ticket, so nothing has been swept under the carpet.
Merged on to main
in 105100fc23b46930859ea8fd5951ae604efe5fdd.
The original example using Kinetic Merge completes (postcondition in CodeMotionAnalysisExtension.merge
is disabled) , but results in a merge conflict that is worthy of further investigation:
My gut feeling is that this should be looked at in the context of #90.
The CPU and heap usage spikes alarmingly in the final phase of Kinetic Merge - namely when computing a longest common subsequence (checked by sampling). The application has on OOM when running in just 4G of heap, the screenshot below used 10G of heap:
That should be looked at in the context of #93.
So with that in mind, let's close this ticket!
Initial description.
Seen using build of Kinetic Merge from Git commit SHA: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1.
Merging in the Kinetic Merge repository, our commit is: 5dbb4faca2a2df4351c9f4892da59601913489f3, their commit is: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1 (same as the one used to build Kinetic Merge).
Command line:
kineticMerge/target/kinetic-merge 7b4fd68698f8ed6b13b067cf87f6abb25db52da1
.Outcome: java.lang.IllegalArgumentException: requirement failed
Incidentally, how about a stack trace?