sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
9 stars 1 forks source link

Weird merge results. #19

Closed sageserpent-open closed 6 months ago

sageserpent-open commented 9 months ago

Doing some dogfood testing of Kinetic Merge on its own repository yielded this peculiar (and incorrect) successful merge result:

Screenshot 2023-10-10 at 13 36 52

(The left hand is the base, the middle is what Git's merge produced and the right hand is what Kinetic Merge produced).

Doing the same with Git's merge produced a conflicted merge, which to be honest seems a better result.

Need to analyse this and decide:

  1. Is this a logical consequence of the underlying merge approach at fine granularity? So it works 'correctly', but doesn't produce reasonable results based on a human's expectations...
  2. ... or is the merge approach actually broken? So it's not tested properly and has a bug in the implementation.
  3. What should have been the desired result?
sageserpent-open commented 6 months ago

Looking at the breakage of MainTest.cleanMergeOfAFileModifiedInBothBranches, something is wrong with merge.

CodeMotionAnalysis does the right thing (although more on that later), it takes the following content:

BASE

Hello, my old mucker!

Sections: A - Hello, my old mucker! 6 tokens.

LEFT

Hello, my old mucker! 
Pleased to see you, old boy.

Sections: A - Hello, my old mucker! 6 tokens. B - Pleased to see you, old boy. 8 tokens.

RIGHT

Hello, all and sundry.

Sections: C - Hello, all and sundry. 6 tokens.

It matches A across the base and left side.

Note how the token Hello did not form an all-sides match, as CodeMotionAnalysis has deliberately engineered logic to stop an all-sides match when two of its putative sections are subsumed within a longer pairwise match - and this is exactly what is happening here.

Now, part of the justification for allowing a longer pairwise match to suppress a shorter all-pairs match is what the merge is expected to do. If we have a merge of:

BASE: A LEFT AB RIGHT C

Then I thought that this is interpreted as an edit of A into C on the right, plus a following insertion of B after A on the left - so the expected merge would be CB.

Applying that logic to presumably make it irrelevant as to whether we have a long pairwise match on Hello, my old mucker! being edited into Hello, all and sundry., versus a short all-sides match of Hello, being preserved and a short pairwise match of my old mucker! being edited into all and sundry.. The result is still the same.

So my suspicions about CodeMotionAnalysis are allayed for now. It will be interesting though to see how this suppression of all-sides matches works out in the wild, but let's leave that for now.

Where it gets fishy is that the actual merge isn't CB - instead we see a conflict, the left hand being B and the right hand being C.

The merge algorithm has gone for a deletion of A on the right, followed by an insertion of C - the deletion goes through cleanly, but the insertion of C on the right conflicts with the insertion of B on the left.

I could have sworn a whole load of work went into supporting - and testing for - the detection of edits!

What happened... ?

sageserpent-open commented 6 months ago

Wrote MergeTest.simpleMergeOfAnEdit in Git commit SHA 670a28bf63e7efc0e02d7bb0054d01cb5b061a9c, this reproduces the problem seen in MainTest but down at the unit level.

This test was then back-ported to Git commit SHA fea076322a4c1de6a26216301116b66479cd683b which is based on an early version of Merge, and that test passes. Modifying the test expectations as a temporary fault exhibits breakage, so it really is checking the result when it passes.

Did a painful linear progressive merge - couldn't use git bisect as this would have to involve unshelving into each bisected commit - given issues with the SBT build compatibility and some merge conflicts, it was easier to just re-evolve the changes step-by-step on to a new branch.

This revealed that the test did pass in back-ported form, as I expected, but was broken in Git commit SHA 3c63dc3a6dd8260e736156e19a8dc5b110dadb12 (merged into 1acfb3ac226da58396d19189a7cf4b99d658ff37 for the reproduction against the back-ported test) Ironically, the commit message states:

Simplify the implementation of Merge. Not sure if this entirely a good idea, as it trades extra code for explicit handling of corner cases against less code using implicit state transformations to do the same thing...

Ho-hum.

Given that all the other tests in MergeTest pass, it is clear that MergeTest.fullMerge isn't assiduous enough.

It would be worth detecting the failure via MergeTest.fullMerge to restore confidence prior to going further.

sageserpent-open commented 6 months ago

Decided to just do some hacking, reverting the offending commit and merging it through to result in Git commit SHA 20c308b99e0d868ba29a5e2806162eda2d2f425b.

This is not fit for delivery, see below - but the results are encouraging.

Had to hack the demonstration repository (we'll get to that), but the part where Git reported a conflict and Kinetic Merge quietly made a mess now produces this:

    packageExecutable := {
      val packagingVersion = (ThisBuild / version).value

      println(s"Packaging executable with version: $packagingVersion")

      val localArtifactCoordinates =
        s"${organization.value}:${name.value}_${scalaBinaryVersion.value}:$packagingVersion"

      val executablePath = s"${target.value}${Path.sep}${name.value}"

      s"cs bootstrap --verbose --bat=true --scala-version ${scalaBinaryVersion.value} -f $localArtifactCoordinates -o $executablePath" !

      name.value
    },

That's exactly what I wanted to see.

sageserpent-open commented 6 months ago

Exhibit D - CodeMotionAnalysisExtensionTest console output as of Git commit SHA 20c308b99e0d868ba29a5e2806162eda2d2f425b:

"""
I wandered lonely as a cloud
That floats on high o'er vales and hills,
When all at once I saw a crowd,
A host, of small fishing boats;
Astride the sea, beneath the quay,
Rocking and swaying in the breeze.

Why this allusion?
I Havant a clue!
Along the margin of a bay:
Ten thousand (well, maybe not quite) I thought, 'Was this part of the job role?'.
'Should I be expected to deal with flowers?'
The waves beside them danced; but they
Out-did the sparkling waves in glee:
A poet could not but be gay,
In such a jocund company:
I gazed—and gazed—but thought only of
raising this in the next Zoom meeting.

For oft, when on my Aeron I slouch
In vacant or in pensive mood,
They flash upon that inward eye
Which is the bliss of solitude;
And then my heart with pleasure fills,
And sends an email to human resources.
"""
*********************************
"""
I wandered lonely as a cloud
That floats on high o'er vales and hills,
When all at once I saw a crowd,
A host, of small fishing boats;
Astride the sea, beneath the quay,
Rocking and swaying in the breeze.

Why this allusion?
I Havant a clue!
Along the margin of a bay:
Ten thousand (well, maybe not quite) sashays with the fishing boats.
"""
sageserpent-open commented 6 months ago

What were those caveats?

sageserpent-open commented 6 months ago

Test case 19 in CodeMotionAnalysis.matchingSectionsAreFound - must try harder to find the optimal match in the genetic algorithm. Maybe there should be a single cutoff for match sizes across all files across all sides, and we can dispense with the genetic algorithm. Or use hinting there too...

sageserpent-open commented 6 months ago

The story so far as of Git commit SHA 420b84b6e87dc5f1780aed511f29278381623d13...

I'm not really sure that the merge is in general trustworthy yet - need to do some general dogfooding, but the issue this ticket refers to seems to be on its way to being fixed.

sageserpent-open commented 6 months ago

Examining the number of evaluations of a phenotype shows that given the number of window sizes that come under the minimum sure-fire size, there are far more evaluations performed by the genetic algorithm than there are window sizes.

Example numbers from CodeMotionAnalysis.matchingSectionsAreFound:

Sizes: [1, 45) - evaluations: 146 - 41 + 1 = 106 Sizes: [1, 8) - evaluations: 238 - 23 + 1 = 216 Sizes: [3, 140) - evaluations: 323 - 26 + 1 = 298 Sizes: [1, 68) - evaluations: 628 - 26 + 1 = 603 Sizes: [1, 27) - evaluations: 428 - 32 + 1 = 397 Sizes: [8, 154) - evaluations: 825 - 25 + 1 = 801 Sizes: [4, 106) - evaluations: 528 - 35 + 1 = 494

Example numbers from CodeMotionAnalysisExtensionTest.merging:

Sizes: [1, 5) - evaluations: 40 - 17 + 1 = 24 Sizes: [1, 4) - evaluations: 24 - 17 + 1 = 8 Sizes: [1, 3) - evaluations: 0 Sizes: [1, 2) - evaluations: 0

In other cases, the genetic algorithm is not invoked at all.

Example numbers from MainTest:

No invocation of genetic algorithm at all.

It is clear that a simple linear search down all the small-fry window sizes will be much more efficient than the genetic algorithm, and definitive in its result. Turning down the dials on the genetic algorithm to match performance is likely to miss a few matches, this has already been seen and has required the dials to go up.

sageserpent-open commented 6 months ago

The genetic algorithm is no longer used as of Git commit SHA c94b207d37c2a57ef074138c2332e0eb307f0b8e (and the tests pass, after fixing a bug in CodeMotionAnalysisTest.matchingSectionsAreFound exposed by the implementation change in the SUT - this has been back-ported prior to the implementation change to check that the test passes in both cases, see Git commit SHA 87441fa1ea53d3a93f85962f6ce79ba72a101b29).

It was a surprise to see that performance is slightly worse overall in CodeMotionAnalysisTest.matchingSectionsAreFound compared with the back-port. The back-port using the genetic algorithm achieves 34 test cases in 10 minutes, compared with 32 test cases in 10 minutes when using the linear search.

In general, using linear search beats the genetic algorithm, but there are two exceptions where the genetic algorithm commit takes 28 seconds / 23 seconds versus 1 minutes 20 seconds / 1 minute 36 seconds.

The window size ranges for the genetic algorithm / linear search are [10, 264), [8, 504). That's a far bit larger than the examples above.

Given that the linear search is systematic, and that CodeMotionAnalysisExtensionTest completes 30 test cases in 8 seconds, I'm of a mind to just accept those two odd test cases and carry on with linear search.

sageserpent-open commented 6 months ago

NOTE: remember to replace Phenotype with something simpler, now that the genetic algorithm has gone. Would be nice to keep the invariant check and the helper method API, though.DONE

sageserpent-open commented 6 months ago

What's been going on? MergeTest has been overhauled significantly in an attempt to make it more comprehensible, as I couldn't follow what it was doing after having not worked on it for over two months. There's a lesson there...

The rework exposed a new bug in merge in addition to the one that had slipped in; this led to a round of back and forth between merge and MergeTest where either a new bug in the SUT was found and corrected, or the test itself was shown to have a bug in its logic and was generating either faulty expected merges or misleading move sequences not consistent with a correct expected merge.

It is fair to say even with the rework, MergeTest is still a very opaque and truculent thing; I've added comments to help demystify it, but it isn't going to serve the role as 'test-as-specification'. It is however really good now at rooting out bugs in both the SUT and also in my own thinking about the edge-cases for the merge. Line coverage of merge is now 100% with the minor exception of - wait for it - an exception class for future divergence detection.

Consequently, I've added several explicit small tests to help plug the gap in a lack of specification-style tests, usually adding these after MergeTest.fullMerge has shrunk down a failure to something that is easy to analyze. Got a bit delinquent about this towards the end, so I'm going to write up what the merge tries to achieve as code comments and cross-check with the specification tests once the dust has settled.

In the meantime, we have the unwrapping of Phenotype from CodeMotionAnalysis and the threshold configuration to do, and then I think we're good.

MainTest and CodeMotionAnalysisExtensionTest are still passing - the latter now fully merge the SBT sources in all test cases, which is a great result. The SBT source merges look sensible when randomly sampling them, which is heartening.

sageserpent-open commented 6 months ago

Manual testing of the example above using the build from Git commit SHA 491d8d6541b34dabbd9a67cd625cbbdbef31bfc5 yields the following excerpt:

    Compile / resourceGenerators += Def.task {
      val location = versionResource.value

      val packagingVersion = (ThisBuild / version).value

      println(
        s"Generating version resource: $location for version: $packagingVersion"
      )

      IO.write(location, packagingVersion)

      Seq(location)
    }.taskValue,
    packageExecutable := {
      val packagingVersion = (ThisBuild / version).value

      println(s"Packaging executable with version: $packagingVersion")

      val localArtifactCoordinates =
        s"${organization.value}:${name.value}_${scalaBinaryVersion.value}:$packagingVersion"

      val executablePath = s"${target.value}${Path.sep}${name.value}"

      s"cs bootstrap --verbose --bat=true --scala-version ${scalaBinaryVersion.value} -f $localArtifactCoordinates -o $executablePath" !

      name.value
    },
    packageExecutable := (packageExecutable dependsOn publishLocal).value,
    libraryDependencies += "com.github.scopt" %% "scopt" % "4.1.0",
    libraryDependencies += "org.typelevel" %% "cats-collections-core" % "0.9.6",
    libraryDependencies += "org.typelevel" %% "cats-core"   % "2.10.0",
    libraryDependencies += "org.typelevel" %% "cats-effect" % "3.5.2",
    libraryDependencies ++= Seq(
      "dev.optics" %% "monocle-core"  % "3.2.0",
      "dev.optics" %% "monocle-macro" % "3.2.0"
    ),
    libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "2.3.0",
    libraryDependencies += "com.lihaoyi"             %% "os-lib"  % "0.9.1",

That will do nicely for this ticket.

sageserpent-open commented 6 months ago

A couple of things left:

sageserpent-open commented 6 months ago

CodeMotionAnalysisTest.matchingSectionsAreFound:

With the cache: 32 cases in 10 minutes 2 seconds, 29 cases in 10 minutes 5 seconds, 29 cases in 10 minutes 2 seconds.

Without the cache: 33 cases in 10 minutes 3 seconds, 33 cases in 10 minutes 4 seconds, 30 cases in 10 minutes.

CodeMotionAnalysisExtensionTest:

With the cache: 8 seconds, 7 seconds

Without the cache: 7 seconds, 7 seconds

It is fair to say that the cache adds no value - probably because it is part of a chain of computation that already involves a cache for some heavyweight computation.

sageserpent-open commented 6 months ago

At long last, delivered in Git commit SHA: 0e12c1f0c3b819f1fa1ea7aec78da7bde7259ef3.

sageserpent-open commented 5 months ago

This went out in release 0.1.10, Git commit SHA 64fd5448f2f9e6d36953c86345d069d62d1d3a2f.