sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
9 stars 1 forks source link

Odd merges seen at higher match thresholds. #24

Closed sageserpent-open closed 5 months ago

sageserpent-open commented 5 months ago

This follows on from #23.

Producing the manual test evidence for that ticket revealed some odd looking merges that didn't seem to have anything to do with the orphaned text problem that was the concern of that ticket.

So this ticket is about investigating those odd merges.

Are they bugs in the implementation? Or perhaps the implementation fufils its requirements, but the requirements themselves allow these odd cases?

Either way, the user expects to see a nice intuitive merge with no surprises.

sageserpent-open commented 5 months ago

Exhibit A from merging the same repository as in #23.

Using the default match threshold results in an odd merge that on reflection may still be an orphaned text merge:

Screenshot 2024-02-01 at 16 40 31

The text whatever comes from a completely unrelated location in the file at around line 222, but has ended up at line 371 in a merge conflict.

Using an explicit threshold via --match-threshold=10% leads to the same outcome.

sageserpent-open commented 5 months ago

Repeating the merge with --match-threshold=0.01 yields this:

Screenshot 2024-02-01 at 16 51 28

At least the rename is in the right place, sort of...

sageserpent-open commented 5 months ago

Repeating the merge with --match-threshold=0.007 yields the obviously correct merge. It should be shaken, not stirred, apparently.

sageserpent-open commented 5 months ago

Enabled logging with a temporary patch...

(Incidentally, might it be useful to switch on logging permanently for the application when run loose from an IDE, but suppressed when run via the packaged Coursier executable?)

Running with --match-threshold=0.01 yields:

[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Minimum match window size across all files over all sides: 507
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Minimum sure-fire match window size across all files over all sides: 507
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Maximum match window size across all files over all sides: 5076
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 2791, number of matches is: 835, looking for a more optimal match with estimated window size of: 3625.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 3625, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 2065, number of matches is: 807, looking for a more optimal match with estimated window size of: 2871.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 2871, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 1097, number of matches is: 354, looking for a more optimal match with estimated window size of: 1450.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 1450, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 742, number of matches is: 2, looking for a more optimal match with estimated window size of: 743.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 743, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 565, number of matches is: 32, looking for a more optimal match with estimated window size of: 596.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 596, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-15] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search for matches whose size is no less than the sure-fire match window size of: 507 has terminated; results are:

Running with --match-threshold=0.007 yields:

[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Minimum match window size across all files over all sides: 35
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Minimum sure-fire match window size across all files over all sides: 35
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Maximum match window size across all files over all sides: 5076
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 2555, number of matches is: 1071, looking for a more optimal match with estimated window size of: 3625.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 3625, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 1829, number of matches is: 1043, looking for a more optimal match with estimated window size of: 2871.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 2871, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 743, number of matches is: 708, looking for a more optimal match with estimated window size of: 1450.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 1450, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 742, number of matches is: 2, looking for a more optimal match with estimated window size of: 743.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 743, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 388, number of matches is: 209, looking for a more optimal match with estimated window size of: 596.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 596, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 174, number of matches is: 127, looking for a more optimal match with estimated window size of: 270.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 270, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 152, number of matches is: 52, looking for a more optimal match with estimated window size of: 203.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 203, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 118, number of matches is: 16, looking for a more optimal match with estimated window size of: 133.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 133, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 83, number of matches is: 10, looking for a more optimal match with estimated window size of: 92.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 92, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 48, number of matches is: 6, looking for a more optimal match with estimated window size of: 51.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 51, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an improved match at window size: 42, number of matches is: 8, looking for a more optimal match with estimated window size of: 49.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search has found an optimal match at window size: 49, number of matches is: 1, restarting search to look for smaller matches.
[io-compute-12] INFO com.sageserpent.kineticmerge.core.CodeMotionAnalysis$ - Search for matches whose size is no less than the sure-fire match window size of: 35 has terminated; results are:
sageserpent-open commented 5 months ago

There were a lot of missed opportunities when the match threshold was 10%.

Given that the low level merge algorithm knows nothing about the fine structure of a section (although it can do a simple comparison of one unmatched section against another), it perhaps not so surprising that it gets confused with long runs of unmatched content.

There is a odd situation where if the merge algorithm is presented with just tokens, then it a) chokes on the number of tokens and b) is perhaps suboptimal at that level of detail, which was presented as the cause of the problem dealt with in issue #19. That led to the full implementation of CodeMotionAnalysis and cut over of the merge to work at section level.

However, a lot of work was done on the low-level merge algorithm in that ticket subsequent to the cutover to section level merging, so maybe the initial diagnosis wasn't correct.

Furthermore, it may be the case that it is optimal to work at section level from the point of view of not choking the merge with umpteen tokens, but perhaps the unmatched sections should be exploded into smaller ones, possibly right down to single token sections - that might let the merge algorithm see inside the detail in the unmatched content and improve the matches there.

sageserpent-open commented 5 months ago

Trying this out in a spike yielded very disappointing results in the tests, and caused a stack overflow when running the application on the example above.

sageserpent-open commented 5 months ago

Perhaps the default match threshold should be estimated from the file sizes so as to ensure that all files see potential matches of 20 tokens or more?

Or could the presence of overly large unmatched sections trigger a diagnostic telling the user to decrease the match threshold down to some suggested level?

Maybe work in terms of a minimal match size across all files and forget about the match threshold? That would simplify the match search logic into the bargain.

sageserpent-open commented 5 months ago

Something completely different - if two sections across the left and right are not equal in terms of being part of the same match or having the same content, could they be merged at the token level by a recursive invocation of the merge algorithm on the section contents?

sageserpent-open commented 5 months ago
  1. Logging has been added as part of this; it makes analysis a lot easier.
  2. CodeMotionAnalysisTest.matchingSectionsAreFound has had its expectations cut over so that they check the consistency of matches separately from the coverage of the expected common sequences - this allows a more flexible approach to the latter that can cope with an unexpectedly larger match being found that subsumes a common sequence, as well as runs of abutting small matches that combine to cover the a common sequence.
  3. The need to do this was highlighted by having to increase the mimimum possible match window size up to 3; this is to avoid a pathological slowdown when trying to match at window sizes of 1 (and to a lesser, but still significant extent at 2). Doing that fixed the slowdown but revealed a bug in the test logic that was being masked before.
sageserpent-open commented 5 months ago

Using a match threshold of 0.01 yields the following merge:

21:11:42.498 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,0,743) as it is common to all three sides.
21:11:42.499 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,743,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,743,1).
21:11:42.499 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,744,596) as it is common to all three sides.
21:11:42.500 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1340,110) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1340,111).
21:11:42.500 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Conflict between left deletion of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1450,1) and its edit on the right into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1450,1).
21:11:42.501 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1451,51) as it is common to all three sides.
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1502,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1502,1).
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1503,203) as it is common to all three sides.
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1706,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1706,1).
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1707,270) as it is common to all three sides.
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1977,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1977,1).
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1978,92) as it is common to all three sides.
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2070,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2070,1).
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2071,133) as it is common to all three sides.
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2204,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2204,1).
21:11:42.502 [io-compute-12] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2205,2871) as it is common to all three sides.

Note the long edit on the left from start offset of 1340, size 110 to make a new sequence from start offset 1340, 111.

This is followed by a conflict between a left deletion of 1450, size 1 and its edit on the right into 1450, size 1.

Possibly this is the whenever token being edited into whatever on the right, preceded by a long stretch of tokens that were the same on the base and right, but which didn't match on the left (probably because of the test method renames). The whenever was considered part of the left-hand sides version of this long stretch, hence the size of 111 instead of 110. So whenever was considered to have vanished on the left and been edited into whatever on the right.

sageserpent-open commented 5 months ago

Using match threshold of zero by contrast yields the following merge:

21:28:42.347 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,0,743) as it is common to all three sides.
21:28:42.347 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,743,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,743,1).
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,744,596) as it is common to all three sides.
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1340,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1340,1).
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1341,28) as it is common to all three sides.
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1369,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1369,1).
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1370,30) as it is common to all three sides.
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1400,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1400,1).
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1401,49) as it is common to all three sides.
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Right edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1450,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1450,1).
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1451,51) as it is common to all three sides.
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1502,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1502,1).
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1503,203) as it is common to all three sides.
21:28:42.348 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1706,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1706,1).
21:28:42.349 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1707,270) as it is common to all three sides.
21:28:42.349 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1977,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1977,1).
21:28:42.349 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,1978,92) as it is common to all three sides.
21:28:42.349 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2070,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2070,1).
21:28:42.349 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2071,133) as it is common to all three sides.
21:28:42.349 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2204,1) into SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2204,1).
21:28:42.349 [io-compute-7] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(/Users/gerardmurphy/IdeaProjects/americium-242m/src/test/java/com/sageserpent/americium/java/junit5/TrialsApiTests.java,2205,2871) as it is common to all three sides.

This time, the long section from start offset 1340 is broken down into lots of smaller left edits and preservations, including a right edit from start offset of 1450, size 1 to make a new single token sequence from start offset 1450, 1. This is preceded by a preservation from start offset 1401, size 49. So in this situation the more precise breakdown exposes the edit on the right.

sageserpent-open commented 5 months ago

Looks like the same long base and right pairwise match with base start offset of 0 and size 1450 was made in both situations, but this was broken down crudely with the higher match threshold by a couple of large all-sides matches, resulting in a base and right pairwise match at base start offset 1340, size 110. With the zero match threshold the breakdown was finer, resulting in amongst other things a base and right pairwise match at base start offset 1400, size 1 - which was followed by (and due to) an all-sides match at base start offset 1401, size 49.

That all-sides match would not have been found in the first situation, as the minimal window size there was 50!

sageserpent-open commented 5 months ago

The manually tested merge runs in a reasonable amount of time with a match threshold of zero, and this default doesn't break any tests. So the default match threshold has been moved down to zero, on the provision that monograph and digraph matches are now forbidden.

This is the state of play as of Git commit SHA d0677159f98a6e76dd2a704a5653be4834f1dde2.

Time to ship it!

sageserpent-open commented 5 months ago

Maybe not quite yet ... the current default match threshold of zero relies on their being a backstop minimal window size of 3 to prevent terrible performance / odd merges.

I'm no longer so sure that the concept of a match threshold is the best approach - it was inspired by Git's similarity setting, but when Kinetic Merge uses it, it means that the minimal window size varies from file to file - that makes it hard for an end user to know what it is they're getting. If say a piece of code moves from a very small file on one branch to a much bigger file on another (or the original file accumulates more content on another), then that code may pass the match threshold on the one branch and not on the other!

Another approach to tweaking the merge is to specify the backstop minimal window size as a command line parameter in its own right, setting the default to 3. Now if that is combined with the default match threshold of zero, then it will block all small-fry matches which due to the linear search are usually quite costly.

Although the window size is expressed in terms of tokens and not characters, as long as the user treats the setting as a magic adjustment, it may prove more workable.

The match threshold concept will be left as is, but will be overridden by the backstop if the threshold would lead to smaller window sizes. We'll see which provides the best form of control...

sageserpent-open commented 5 months ago

Introduced command line flag --minimum-match-size, defaults to 4 in Git commit SHA: 30274b30fb95466648dac45727b4f0637e52392f.

sageserpent-open commented 5 months ago

Went out in release 0.2.0, Git commit SHA: 3c9f090f79de1bdc82c079c16e122f668685df29.