sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
9 stars 1 forks source link

Clean Merging (including code motion inside files) #13

Closed sageserpent-open closed 3 months ago

sageserpent-open commented 10 months ago

This follows on from #1 , adding in the requirement dropped from that ticket:

Given a Git repository of files in multiple directories that has been worked on in two branches, provide a command line utility that merges one branch into another, taking into account code motion inside files.

Any divergent code motion is treated as an error and causes the merge to be aborted - the repository should be left as it was prior to the merge being attempted.

Code motion is only considered inside a file on a file-by-file basis - so when a method is extracted and placed elsewhere in the same file, it should be picked up as code motion in this ticket. Similarly for code rearrangement due to method sorting in something like IntelliJ.

Moving a class into its own file would not be picked up in this ticket, that job is for #26.

sageserpent-open commented 4 months ago

Update: breaking out code motion across files into its own ticket: #26 .

sageserpent-open commented 3 months ago

It may be the case that divergent code motion can be accomodated, at least in some situations by allowing both moves to take place; this can also pick up edits from the other side of the move in both cases. It is then up to user to decide whether to delete either or both moves, or simply accept both.

This doesn't work with ambiguous matches because multiple edits may associate to several of the matches, but that can be left as unsupported.

In fact, having ambiguous matches with code motion can lead to a situation where a moved section is associated with multiple edits on the other side - so there are edit collisions on the same side. Stay away from this for now...

sageserpent-open commented 3 months ago

More pondering leads to this rule of thumb: if an all-sides match leads to a coincident edit in the merge with the left and right contributions to the match being moved to different locations, then we have two independent edits, each applying to the move on the opposite side.

Going further, if the two moves land in the same place and are picked up by the merge as either a coincident insertion or edit, then we have an edit collision that can be represented as an insertion / insertion conflict or an edit / edit conflict.

If the two moves land in the same place that happens to be a preservation or coincident insertion because of an ambiguous match, we don't care what happens as that's not supported yet, as long as the application doesn't crash.

sageserpent-open commented 3 months ago

Regarding edit collisions due to ambiguous matches, a quick-and-dirty approach is to store the rival edits as a set, so duplicated edits don't actually conflict. If a set turns out to have more than edit, they are simply concatenated in no particular order and a diagnostic can be issued. A deletion would form its own entry in the set too.

For now however, just store one edit or deletion for each side and issue a diagnostic if a collision occurs on the same side.

sageserpent-open commented 3 months ago

So in the implementation of the merge algebra for MergeResultDetectingMotion, we have to build up a mapping from the moved element or elements to either an edit or a deletion - and we have to supply the moved elements from the match and ignore any left or right element passed in by the calling merge algoithm, because that favours the left side for preservations, coincident insertions and coincident deletions.

By using the left and right elements from the match, we can distinguish between edits and / or deletions associated with both sides.

sageserpent-open commented 3 months ago

Got tests for the simpler situations exercising MergeResultDetectingMotion where there are just one edit or deletion to propagate in a move.

The more complex scenarios can wait (eg. a single edit on one side versus two distinct moves on the other side, or an edit on one side adjacent to a move destination versus a move on the other side).

The next thing to do is to wire in use of MergeResultDetectingMotion into CodeMotionAnalysisExtension and use the move mappings there - and write some example tests for CodeMotionAnalysisExtension to shows moves, moved edits and moved deletions in operation...

sageserpent-open commented 3 months ago

Got a failure in CodeMotionAnalysisExtensionTest.codeMotion, analysis follows...

sageserpent-open commented 3 months ago

BASE FILE:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    override val showLocation: Boolean = true
    override val showTypes: Boolean    = true
  end assert
end ExpectyFlavouredAssert
sageserpent-open commented 3 months ago

LEFT FILE:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    // Swapped the next two lines around...
    override val showTypes: Boolean    = true
    override val showLocation: Boolean = true

  end assert
end ExpectyFlavouredAssert
sageserpent-open commented 3 months ago

RIGHT FILE:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    override val showLocation: Boolean = true
    override val showTypes: Boolean    = false // This edit should propagate.
  end assert
end ExpectyFlavouredAssert
sageserpent-open commented 3 months ago

DESIRED MERGE BASED ON INTUITION:

package com.sageserpent.kineticmerge.core

import com.eed3si9n.expecty.Expecty

object ExpectyFlavouredAssert:
  val assert: Expecty = new Expecty:
    // Swapped the next two lines around...
    override val showTypes: Boolean    = false // This edit should propagate.
    override val showLocation: Boolean = true

  end assert
end ExpectyFlavouredAssert
sageserpent-open commented 3 months ago

Outcome seen as of Git commit SHA: 9c5dd269b1dd719b5878dedc03c823deb9933d16 ...

11:43:26.972 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,0,28) as it is common to all three sides.
11:43:26.974 [main] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(*** STUNT DOUBLE ***,28,7) into SectionImplementation(*** STUNT DOUBLE ***,28,11).
11:43:26.975 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,39,6) as it is common to all three sides.
11:43:26.975 [main] DEBUG c.s.kineticmerge.core.merge$ -- Edit conflict of SectionImplementation(*** STUNT DOUBLE ***,41,5) into SectionImplementation(*** STUNT DOUBLE ***,45,1) on the left and SectionImplementation(*** STUNT DOUBLE ***,41,12) on the right, coalescing with following left insertion of SectionImplementation(*** STUNT DOUBLE ***,46,7).
11:43:26.977 [main] DEBUG c.s.kineticmerge.core.merge$ -- Edit conflict of SectionImplementation(*** STUNT DOUBLE ***,41,5) into SectionImplementation(*** STUNT DOUBLE ***,46,7) on the left and SectionImplementation(*** STUNT DOUBLE ***,41,12) on the right, coalescing with following left insertion of SectionImplementation(*** STUNT DOUBLE ***,53,4).
11:43:26.977 [main] DEBUG c.s.kineticmerge.core.merge$ -- Edit conflict of SectionImplementation(*** STUNT DOUBLE ***,41,5) into SectionImplementation(*** STUNT DOUBLE ***,53,4) on the left and SectionImplementation(*** STUNT DOUBLE ***,41,12) on the right.
11:43:26.982 [main] DEBUG c.s.kineticmerge.core.merge$ -- Merge yielded:
MergeResultDetectingMotion(
  coreMergeResult = MergedWithConflicts(
    leftElements = Vector(
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 0, size = 28),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 28, size = 11),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 39, size = 6),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 45, size = 1),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 46, size = 7),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 53, size = 4)
    ),
    rightElements = Vector(
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 0, size = 28),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 28, size = 11),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 39, size = 6),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 41, size = 12)
    )
  ),
  changesPropagatedThroughMotion = Map()
)
sageserpent-open commented 3 months ago
  1. override val showTypes: Boolean = was matched and was considered to be preserved in the same spot, contrary to the intuitive view that it would move on the left side towards the start of the file. It appeared as (..., 35, 6) on the base and right and (..., 39, 6) on the left.
  2. override val showLocation: Boolean = true was considered to be moved to the end of the file instead, being dealt with as a left-edit at its source and with the moved section appearing in a conflict at the end. It appeared as (..., 28, 7) on the base and right and (..., 46, 7) on the left.
  3. The initialiser true for override val showTypes: Boolean = true on the base and right was expected to match as a base + right match, being edited into false. This didn't happen (and as a single token, wouldn't be picked up anyway as a match), but puzzlingly, the merge algorithm didn't get the chance to make a last-minute fallback match either...
  4. ... what did happen as that on the base, the entire:true<newline>end assert<newline>end ExpectyFlavouredAssert became a section in its own right (..., 41, 5) ...
  5. ... on the left, true<newline> and then end assert<newline>end ExpectyFlavouredAssert became two sections (..., 45, 1) and (..., 53, 4) with the moved section (..., 46, 7) sitting between them ...
  6. ... and on the right, false // This edit should propagate.<newline>end assert<newline>end ExpectyFlavouredAssert came in as a final section (..., 41, 12).

So we got a conflict at the end.

sageserpent-open commented 3 months ago

The minimum sure-fire match size was 5, and as there was just the one file, no smaller matches would have been considered. So the obvious match of end assert<newline>end ExpectyFlavouredAssert across all sides didn't happen as that would be four tokens long with whitespace condensed into the preceding token.

The final section on each side was generated as a filler, hence the lack of consistency between them.

sageserpent-open commented 3 months ago

What happens with a minimum match size of 4? (Git commit SHA: 5a9a82be8b8efb3cbebd369a28f0a77deafd25c0)

Now we see an all-sides match for end assert<newline>end ExpectyFlavouredAssert, good.

13:26:03.567 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: base at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,41,1).
13:26:03.567 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: left at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,28,11).
13:26:03.568 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: left at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,45,1).
13:26:03.568 [main] DEBUG c.s.k.core.MappedContentSources -- Filling gap on side: right at path: *** STUNT DOUBLE *** prior to following section with: SectionImplementation(*** STUNT DOUBLE ***,41,8).
13:26:03.618 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,0,28) as it is common to all three sides.
13:26:03.620 [main] DEBUG c.s.kineticmerge.core.merge$ -- Left edit of SectionImplementation(*** STUNT DOUBLE ***,28,7) into SectionImplementation(*** STUNT DOUBLE ***,28,11).
13:26:03.620 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,39,6) as it is common to all three sides.
13:26:03.621 [main] DEBUG c.s.kineticmerge.core.merge$ -- Right edit of SectionImplementation(*** STUNT DOUBLE ***,41,1) into SectionImplementation(*** STUNT DOUBLE ***,41,8).
13:26:03.621 [main] DEBUG c.s.kineticmerge.core.merge$ -- Left insertion of SectionImplementation(*** STUNT DOUBLE ***,46,7).
13:26:03.622 [main] DEBUG c.s.kineticmerge.core.merge$ -- Preservation of SectionImplementation(*** STUNT DOUBLE ***,53,4) as it is common to all three sides.
13:26:03.625 [main] DEBUG c.s.kineticmerge.core.merge$ -- Merge yielded:
MergeResultDetectingMotion(
  coreMergeResult = FullyMerged(
    elements = Vector(
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 0, size = 28),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 28, size = 11),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 39, size = 6),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 41, size = 8),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 46, size = 7),
      SectionImplementation(path = "*** STUNT DOUBLE ***", startOffset = 53, size = 4)
    )
  ),
  changesPropagatedThroughMotion = Map()
)

That is fully merged, seems promising ... but the test still fails!

sageserpent-open commented 3 months ago

Ah - the test expectations were too rigid - was using default equality to compare tokens, which takes whitespace into account. The tests were cutover to use the whitespace-insensitive Token.equality and after backpatching the test to check whether the changes in expectations had any effect on legacy testing, this was merged in via Git commit SHA: a694056de420681c4de0f20e65e183e77c43482a.

Test now passes.

sageserpent-open commented 3 months ago

This is good, but I'm uneasy about how much has been proven.

  1. The test analysed above didn't actually propagate any edits via moves - the edited code stayed put and the non-edited code leapfrogged over it. That is perfectly valid in its own right, but we need to prove propagation of edits (and deletions) works.
  2. Related to this is the ambiguity on the longest common subsequence algorithm about which LCS to choose when there is more than one possibility. This happens here because two lines of code are reordered - so one gets to be part of the LCS and the other is considered to move.
  3. We can jemmy the test cases so as to stumble across one that has to move an edit (or deletion), but here's a thought: it seems intuitively obvious that when reordering a long section of text with a smaller section of text, it is the smaller one that should be considered as moving - at least if both the moved and the LCS section both belong to distinct all-sides matches across each side. If one section is only matched in a pairwise match and the other in an all-sides match, it seems reasonable to treat the all-sides match as being part of the LCS (and that it was the longest common subsequence algorithm will do in a tiebreak).
sageserpent-open commented 3 months ago

Another wrinkle is that the nice intuitive picture of a single section moving on one side of the merge and being edited on the other generally won't be the case unless the minimum match size is quite large.

What will happen instead is that there will be three sections on each side - a leading and trailing one that are both matched on all sides, moving in tandem, and a middle one that has a pairwise match between the base and the side with the move - and an unmatched section that is the precisely the edit change. I expect this to work with the current implementation as of a694056de420681c4de0f20e65e183e77c43482a, but this needs testing...

sageserpent-open commented 3 months ago

As of Git commit SHA: d3e02cbe84038cda1245293b526447d8f96eb47c, CodeMotionAnalysisExtensionTest.codeMotion has been bolstered to show the propagation of an edit through a section move, and it passes.

Code motion has arrived!

sageserpent-open commented 3 months ago

Pushing all further work on supporting edge cases for propagating edits and deletions through moves into a separate ticket #28 .

sageserpent-open commented 3 months ago

Favouring moves of small sections over moves of large sections by modifying LongestCommonSubsequence results in a merge failure in CodeMotionAnalysisExtensionTest.codeMotion, investigating this in case the change in behaviour of the LCS is exposing a fundamental defect in the downstream merge...

sageserpent-open commented 3 months ago

As of Git commit SHA: 9a2fddb45e1231b24a167707c55bf7378a4edc31 the longest common subsequence algorithm favours subsequences with longer section when choosing between longest common subsequences of the same length.

This is reflected in the downstream merge, where it is the smaller sections that get picked up as code motion when they interchange with larger sections.

The test failure noted above in CodeMotionAnalysisExtensionTest.codeMotion is due to the lack of support for edit anchoring. What happened was that a large construct was moved on one side and a small suffix of it edited on the other - the unchanged prefix moved cleanly through the merge, being picked up as an all-sides match, but the small suffix was not picked up as a pairwise match on the base and on the other side from the edit.

This led to the suffix moving on the other side from the edit as is, and an ordinary delete versus edit conflict at the source of the move. Had the suffix been large enough to match, then this would have resulted in a move of the suffix in tandem with the prefix; the edit being propagated through the move.

Edit anchoring is a possible way of dealing with this - unmatched sections are considered to be anchored to adjacent matched sections, so as long as the matched sections move in tandem, they could pick up the unmatched section as a move, thus allowing edit propagation.

A new ticket #29 has been made for edit anchoring; for now, we have something worth getting out there...

sageserpent-open commented 3 months ago

This went out in release 1.1.0, Git commit SHA: 26a9c9a64451864b3373536b8f8d934ca869a71c.

sageserpent-open commented 3 months ago

Evidence (screencast of a demonstration of Kinetic Merge):

https://github.com/sageserpent-open/kineticMerge/assets/1765601/2d676e20-dce0-441a-b4a2-fcbaa8aff35d

https://github.com/sageserpent-open/kineticMerge/assets/1765601/783172ef-7688-45ab-83ab-2e0aa633ce01

https://github.com/sageserpent-open/kineticMerge/assets/1765601/370ddbce-98c7-42d2-8c81-f5c2756d051d

sageserpent-open commented 3 months ago

Closing now that evidence is uploaded.