Weird merge results. - Githubissues

sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.

MIT License

9 stars 1 forks source link

Weird merge results. #19

Closed sageserpent-open closed 6 months ago

sageserpent-open commented 9 months ago

Doing some dogfood testing of Kinetic Merge on its own repository yielded this peculiar (and incorrect) successful merge result:

Screenshot 2023-10-10 at 13 36 52

(The left hand is the base, the middle is what Git's merge produced and the right hand is what Kinetic Merge produced).

Doing the same with Git's merge produced a conflicted merge, which to be honest seems a better result.

Need to analyse this and decide:

Is this a logical consequence of the underlying merge approach at fine granularity? So it works 'correctly', but doesn't produce reasonable results based on a human's expectations...
... or is the merge approach actually broken? So it's not tested properly and has a bug in the implementation.
What should have been the desired result?

sageserpent-open commented 6 months ago

Looking at the breakage of MainTest.cleanMergeOfAFileModifiedInBothBranches, something is wrong with merge.

CodeMotionAnalysis does the right thing (although more on that later), it takes the following content:

BASE

Hello, my old mucker!

Sections: A - Hello, my old mucker! 6 tokens.

LEFT

Hello, my old mucker! 
Pleased to see you, old boy.

Sections: A - Hello, my old mucker! 6 tokens. B - Pleased to see you, old boy. 8 tokens.

RIGHT

Hello, all and sundry.

Sections: C - Hello, all and sundry. 6 tokens.

It matches A across the base and left side.

Note how the token Hello did not form an all-sides match, as CodeMotionAnalysis has deliberately engineered logic to stop an all-sides match when two of its putative sections are subsumed within a longer pairwise match - and this is exactly what is happening here.

Now, part of the justification for allowing a longer pairwise match to suppress a shorter all-pairs match is what the merge is expected to do. If we have a merge of:

BASE: A LEFT AB RIGHT C

Then I thought that this is interpreted as an edit of A into C on the right, plus a following insertion of B after A on the left - so the expected merge would be CB.

Applying that logic to presumably make it irrelevant as to whether we have a long pairwise match on Hello, my old mucker! being edited into Hello, all and sundry., versus a short all-sides match of Hello, being preserved and a short pairwise match of my old mucker! being edited into all and sundry.. The result is still the same.

So my suspicions about CodeMotionAnalysis are allayed for now. It will be interesting though to see how this suppression of all-sides matches works out in the wild, but let's leave that for now.

Where it gets fishy is that the actual merge isn't CB - instead we see a conflict, the left hand being B and the right hand being C.

The merge algorithm has gone for a deletion of A on the right, followed by an insertion of C - the deletion goes through cleanly, but the insertion of C on the right conflicts with the insertion of B on the left.

I could have sworn a whole load of work went into supporting - and testing for - the detection of edits!

What happened... ?

sageserpent-open commented 6 months ago

Wrote MergeTest.simpleMergeOfAnEdit in Git commit SHA 670a28bf63e7efc0e02d7bb0054d01cb5b061a9c, this reproduces the problem seen in MainTest but down at the unit level.

This test was then back-ported to Git commit SHA fea076322a4c1de6a26216301116b66479cd683b which is based on an early version of Merge, and that test passes. Modifying the test expectations as a temporary fault exhibits breakage, so it really is checking the result when it passes.

Did a painful linear progressive merge - couldn't use git bisect as this would have to involve unshelving into each bisected commit - given issues with the SBT build compatibility and some merge conflicts, it was easier to just re-evolve the changes step-by-step on to a new branch.

This revealed that the test did pass in back-ported form, as I expected, but was broken in Git commit SHA 3c63dc3a6dd8260e736156e19a8dc5b110dadb12 (merged into 1acfb3ac226da58396d19189a7cf4b99d658ff37 for the reproduction against the back-ported test) Ironically, the commit message states:

Simplify the implementation of Merge. Not sure if this entirely a good idea, as it trades extra code for explicit handling of corner cases against less code using implicit state transformations to do the same thing...

Ho-hum.

Given that all the other tests in MergeTest pass, it is clear that MergeTest.fullMerge isn't assiduous enough.

It would be worth detecting the failure via MergeTest.fullMerge to restore confidence prior to going further.

sageserpent-open commented 6 months ago

Decided to just do some hacking, reverting the offending commit and merging it through to result in Git commit SHA 20c308b99e0d868ba29a5e2806162eda2d2f425b.

This is not fit for delivery, see below - but the results are encouraging.

Had to hack the demonstration repository (we'll get to that), but the part where Git reported a conflict and Kinetic Merge quietly made a mess now produces this:

    packageExecutable := {
      val packagingVersion = (ThisBuild / version).value

      println(s"Packaging executable with version: $packagingVersion")

      val localArtifactCoordinates =
        s"${organization.value}:${name.value}_${scalaBinaryVersion.value}:$packagingVersion"

      val executablePath = s"${target.value}${Path.sep}${name.value}"

      s"cs bootstrap --verbose --bat=true --scala-version ${scalaBinaryVersion.value} -f $localArtifactCoordinates -o $executablePath" !

      name.value
    },

That's exactly what I wanted to see.

sageserpent-open commented 6 months ago

Exhibit D - CodeMotionAnalysisExtensionTest console output as of Git commit SHA 20c308b99e0d868ba29a5e2806162eda2d2f425b:

"""
I wandered lonely as a cloud
That floats on high o'er vales and hills,
When all at once I saw a crowd,
A host, of small fishing boats;
Astride the sea, beneath the quay,
Rocking and swaying in the breeze.

Why this allusion?
I Havant a clue!
Along the margin of a bay:
Ten thousand (well, maybe not quite) I thought, 'Was this part of the job role?'.
'Should I be expected to deal with flowers?'
The waves beside them danced; but they
Out-did the sparkling waves in glee:
A poet could not but be gay,
In such a jocund company:
I gazed—and gazed—but thought only of
raising this in the next Zoom meeting.

For oft, when on my Aeron I slouch
In vacant or in pensive mood,
They flash upon that inward eye
Which is the bliss of solitude;
And then my heart with pleasure fills,
And sends an email to human resources.
"""
*********************************
"""
I wandered lonely as a cloud
That floats on high o'er vales and hills,
When all at once I saw a crowd,
A host, of small fishing boats;
Astride the sea, beneath the quay,
Rocking and swaying in the breeze.

Why this allusion?
I Havant a clue!
Along the margin of a bay:
Ten thousand (well, maybe not quite) sashays with the fishing boats.
"""

sageserpent-open commented 6 months ago

What were those caveats?

~~The major testing in MergeTest.fullMerge needs bolstering / rework to detect the bug introduced in 3c63dc3a6dd8260e736156e19a8dc5b110dadb12, then merging up with ongoing work.~~ DONE AND PRAISE BE, IT'S OVER
The original problem reproduction repository causes Kinetic Merge to detect overlaps due to the stanzas in build.sbt that set up library dependencies. It looks like this is a case of ambiguous matches that overlap, need to think about what exactly is overlapping and whether it could be avoided, or should somehow be disregarded. In the meantime, the repository has some nasty extra commits added to the destination and their branch to remove these pesky stanzas. FIXED AND NASTY EXTRA COMMITS REMOVED
When the overlaps are detected in MappedContentSources.filesByPathUtilising, an exception of type OverlappingSections is thrown. That was fine back when said class was actually FakeSources and was a stunt double for CodeMotionAnalysisTest - the exceptions were caught by the test and used to reject the test case. That doesn't play well in Main and causes it to fault without rolling back the merge in progress. The code has been hacked to deal with this in Main, but this needs thinking about and doing properly. FUDGED AROUND
That nice merge result in the example came from a conflicted merge, so there are staged versions of build.sbt after the merge - a common ancestor one in stage 1, the destination branch's one in stage 2 and theirs in stage 3. There is also a conflicted form of build.sbt left in the working tree, only this doesn't actually have any conflicts. It appears that while Git can't resolve that merge when it has a go, the invocation of git merge-file from Kinetic Merge is able to completely resolve the changes based on the staged files as created by Kinetic Merge. Now that is intended, but I'm surprised that was the outcome here - I was expecting either a direct, clean merge, or a conflicted one with markers. Here, the conflict is reported but git merge-file quietly saves the day.
~~The hardcoded matching threshold has been bumped up to avoid seeing even more overlaps; what would the merge look like if it was brought back down?~~ IT LOOKS BETTER It should also be configurable too. NOT IN THIS TICKET, THOUGH
One last disquieting thought - given the bug introduced in 3c63dc3a6dd8260e736156e19a8dc5b110dadb12, suppose Kinetic Merge had the reversion merged into the codebase as it was back when this bug was raised, prior to all this work on CodeMotionAnalysis? Hmm... NO LONGER CARE, IT'S WORKING WELL ENOUGH NOW
~~The last-minute conflict resolution by git merge-file might just be sorting out trivial conflicts due to the removed stanzas. Need to look into this.~~WAS NOT THE CAUSE, DON'T WORRY
~~The existing smoke test for merging prose should be extended to use the example from this ticket in abridged form.~~DONE

sageserpent-open commented 6 months ago

Test case 19 in CodeMotionAnalysis.matchingSectionsAreFound - must try harder to find the optimal match in the genetic algorithm. Maybe there should be a single cutoff for match sizes across all files across all sides, and we can dispense with the genetic algorithm. Or use hinting there too...

sageserpent-open commented 6 months ago

The story so far as of Git commit SHA 420b84b6e87dc5f1780aed511f29278381623d13...

CodeMotionAnalysisExtensionTest has been generalised to smoke test merging of prose and a simplified form of the SBT sources this ticket is about.
Still have to bolster MergeTest.fullMerge.
The problem with overlaps caused by the stanzas has been addressed by generalising the notion of match suppression to allow larger matches to suppress smaller matches that overlap with them, in addition to the suppression of included matches.
CodeMotionAnalysis has been, 'enhanced' with a configuration parameter that allows it to either propagate exceptions - specifically the OverlappingSection exceptions seen so often in the tests versus trapping them into an Either so that Main can utilise its roll-back logic. This also fixes the breakages in CodeMotionAnalysisTest caused by the prevalance of small fry. The good news is that the previous enhancement results in a lot less of the rejected test cases without breaking anything.
There is a bug fix for when the logic used to confirm that a guessed optimal window size hits the mark - when a single all-sides match and a single pairwise match was found, this fooled the logic into thinking that only one match was present, so overlapping suboptimal matches were occurring. This was spotted while doing due diligence on the overlaps; now seeing at least one previously rejected test case passing. This was not the cause of the stanza problem, though.
The conflicts (and some subsequently noticed odd conflicted merge results) are down the the matching thresholds being too high - 0.1 is good enough for the example this ticket is about, need to get down lower for CodeMotionAnalysisExtensionTest, possibly this is because the files are shortened.
Had to up the population size used by the genetic algorithm as well to get all test cases in CodeMotionAnalysisExtensionTest to pass. There should be some logic to configure how hard the genetic algorithm has to work based on file sizes and thresholds etc.
Really need to configure the threshold via the command line!

I'm not really sure that the merge is in general trustworthy yet - need to do some general dogfooding, but the issue this ticket refers to seems to be on its way to being fixed.

sageserpent-open commented 6 months ago

Examining the number of evaluations of a phenotype shows that given the number of window sizes that come under the minimum sure-fire size, there are far more evaluations performed by the genetic algorithm than there are window sizes.

Example numbers from CodeMotionAnalysis.matchingSectionsAreFound:

Sizes: [1, 45) - evaluations: 146 - 41 + 1 = 106 Sizes: [1, 8) - evaluations: 238 - 23 + 1 = 216 Sizes: [3, 140) - evaluations: 323 - 26 + 1 = 298 Sizes: [1, 68) - evaluations: 628 - 26 + 1 = 603 Sizes: [1, 27) - evaluations: 428 - 32 + 1 = 397 Sizes: [8, 154) - evaluations: 825 - 25 + 1 = 801 Sizes: [4, 106) - evaluations: 528 - 35 + 1 = 494

Example numbers from CodeMotionAnalysisExtensionTest.merging:

Sizes: [1, 5) - evaluations: 40 - 17 + 1 = 24 Sizes: [1, 4) - evaluations: 24 - 17 + 1 = 8 Sizes: [1, 3) - evaluations: 0 Sizes: [1, 2) - evaluations: 0

In other cases, the genetic algorithm is not invoked at all.

Example numbers from MainTest:

No invocation of genetic algorithm at all.

It is clear that a simple linear search down all the small-fry window sizes will be much more efficient than the genetic algorithm, and definitive in its result. Turning down the dials on the genetic algorithm to match performance is likely to miss a few matches, this has already been seen and has required the dials to go up.

sageserpent-open commented 6 months ago

The genetic algorithm is no longer used as of Git commit SHA c94b207d37c2a57ef074138c2332e0eb307f0b8e (and the tests pass, after fixing a bug in CodeMotionAnalysisTest.matchingSectionsAreFound exposed by the implementation change in the SUT - this has been back-ported prior to the implementation change to check that the test passes in both cases, see Git commit SHA 87441fa1ea53d3a93f85962f6ce79ba72a101b29).

It was a surprise to see that performance is slightly worse overall in CodeMotionAnalysisTest.matchingSectionsAreFound compared with the back-port. The back-port using the genetic algorithm achieves 34 test cases in 10 minutes, compared with 32 test cases in 10 minutes when using the linear search.

In general, using linear search beats the genetic algorithm, but there are two exceptions where the genetic algorithm commit takes 28 seconds / 23 seconds versus 1 minutes 20 seconds / 1 minute 36 seconds.

The window size ranges for the genetic algorithm / linear search are [10, 264), [8, 504). That's a far bit larger than the examples above.

Given that the linear search is systematic, and that CodeMotionAnalysisExtensionTest completes 30 test cases in 8 seconds, I'm of a mind to just accept those two odd test cases and carry on with linear search.

sageserpent-open commented 6 months ago

NOTE: remember to replace Phenotype with something simpler, now that the genetic algorithm has gone. Would be nice to keep the invariant check and the helper method API, though.DONE

sageserpent-open commented 6 months ago

What's been going on? MergeTest has been overhauled significantly in an attempt to make it more comprehensible, as I couldn't follow what it was doing after having not worked on it for over two months. There's a lesson there...

The rework exposed a new bug in merge in addition to the one that had slipped in; this led to a round of back and forth between merge and MergeTest where either a new bug in the SUT was found and corrected, or the test itself was shown to have a bug in its logic and was generating either faulty expected merges or misleading move sequences not consistent with a correct expected merge.

It is fair to say even with the rework, MergeTest is still a very opaque and truculent thing; I've added comments to help demystify it, but it isn't going to serve the role as 'test-as-specification'. It is however really good now at rooting out bugs in both the SUT and also in my own thinking about the edge-cases for the merge. Line coverage of merge is now 100% with the minor exception of - wait for it - an exception class for future divergence detection.

Consequently, I've added several explicit small tests to help plug the gap in a lack of specification-style tests, usually adding these after MergeTest.fullMerge has shrunk down a failure to something that is easy to analyze. Got a bit delinquent about this towards the end, so I'm going to write up what the merge tries to achieve as code comments and cross-check with the specification tests once the dust has settled.

In the meantime, we have the unwrapping of Phenotype from CodeMotionAnalysis and the threshold configuration to do, and then I think we're good.

MainTest and CodeMotionAnalysisExtensionTest are still passing - the latter now fully merge the SBT sources in all test cases, which is a great result. The SBT source merges look sensible when randomly sampling them, which is heartening.

sageserpent-open commented 6 months ago

Manual testing of the example above using the build from Git commit SHA 491d8d6541b34dabbd9a67cd625cbbdbef31bfc5 yields the following excerpt:

    Compile / resourceGenerators += Def.task {
      val location = versionResource.value

      val packagingVersion = (ThisBuild / version).value

      println(
        s"Generating version resource: $location for version: $packagingVersion"
      )

      IO.write(location, packagingVersion)

      Seq(location)
    }.taskValue,
    packageExecutable := {
      val packagingVersion = (ThisBuild / version).value

      println(s"Packaging executable with version: $packagingVersion")

      val localArtifactCoordinates =
        s"${organization.value}:${name.value}_${scalaBinaryVersion.value}:$packagingVersion"

      val executablePath = s"${target.value}${Path.sep}${name.value}"

      s"cs bootstrap --verbose --bat=true --scala-version ${scalaBinaryVersion.value} -f $localArtifactCoordinates -o $executablePath" !

      name.value
    },
    packageExecutable := (packageExecutable dependsOn publishLocal).value,
    libraryDependencies += "com.github.scopt" %% "scopt" % "4.1.0",
    libraryDependencies += "org.typelevel" %% "cats-collections-core" % "0.9.6",
    libraryDependencies += "org.typelevel" %% "cats-core"   % "2.10.0",
    libraryDependencies += "org.typelevel" %% "cats-effect" % "3.5.2",
    libraryDependencies ++= Seq(
      "dev.optics" %% "monocle-core"  % "3.2.0",
      "dev.optics" %% "monocle-macro" % "3.2.0"
    ),
    libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "2.3.0",
    libraryDependencies += "com.lihaoyi"             %% "os-lib"  % "0.9.1",

That will do nicely for this ticket.

sageserpent-open commented 6 months ago

A couple of things left:

Although Kinetic Merge resports a conflict, the invocation of git merge-file is makes to produce the conflicted merge file in the working directory yields a fully merged file. Looking at this, it seeems that a conflicting merge is generated, but the right hand file is just the same as the left hand one, only with an extra part appended. The contents of the appended part are also a bit puzzling. I'm unsure as to whether this is down to the work done on this ticket, or has always been there, but I'm inclined to put and end to this one and investigate under a following ticket.
There is a cache with a TODO comment asking for its removal to be investigated.
Configuring the threshold for section matching needs to be added, but not in this ticket.

sageserpent-open commented 6 months ago

CodeMotionAnalysisTest.matchingSectionsAreFound:

With the cache: 32 cases in 10 minutes 2 seconds, 29 cases in 10 minutes 5 seconds, 29 cases in 10 minutes 2 seconds.

Without the cache: 33 cases in 10 minutes 3 seconds, 33 cases in 10 minutes 4 seconds, 30 cases in 10 minutes.

CodeMotionAnalysisExtensionTest:

With the cache: 8 seconds, 7 seconds

Without the cache: 7 seconds, 7 seconds

It is fair to say that the cache adds no value - probably because it is part of a chain of computation that already involves a cache for some heavyweight computation.

sageserpent-open commented 6 months ago

At long last, delivered in Git commit SHA: 0e12c1f0c3b819f1fa1ea7aec78da7bde7259ef3.

sageserpent-open commented 5 months ago

This went out in release 0.1.10, Git commit SHA 64fd5448f2f9e6d36953c86345d069d62d1d3a2f.