Are moves calculated correctly when there are ambiguous matches?

sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.

MIT License

12 stars 2 forks source link

Are moves calculated correctly when there are ambiguous matches? #90

Closed sageserpent-open closed 2 weeks ago

sageserpent-open commented 1 month ago

This came up when thinking about how to implement #83 .

Change migration relies on MatchesContext to recognise the source of moves when there are deletions and edits to be migrated.

MatchesContext also recognises the destination of moves in the context of handling insertions and edits. Amongst other things, this forms of part of finding anchors as a prelude to performing insertion migration. The move destinations report is also built up from this.

So far, so good - only both areas of code examine matches to complete the picture of either where the move is going (for change migration) or where the move has come from (for move destinations).

This isn't always correct when there are ambiguous matches; yes, it is expected to have a valid situation where a change might be migrated to several destinations (all on the same side of the overall merge), or to have several anchors with the same content (all on the same side of the overall merge) being moved to the same location. A mix of these is possible with multiple sources and destinations too.

Where it falls down is when there is an obvious move of content from a source location to a destination location - either intra-file or inter-file, but elsewhere there is a simple preservation of content that matches the moved content. We do not want to consider the parts of the preservation is being candidates for a move source or destination.

sageserpent-open commented 1 month ago

The job here is to add a failing test that exposes any such problem in CodeMotionAnalysisExtensionTest, then to implement to fix it.

MatchesContextTest will probably need some work too as the implementation proceeds.

sageserpent-open commented 1 month ago

While we're at it, maybe it's time to join up the tracking of moves with change migration?

I'm not sure if this is feasible, though, and there is some code here that is probably saving the day for the change migration situation - perhaps that is good enough?

sageserpent-open commented 1 month ago

Idea: cutover the association of changes with potential destination keys to being an association with potential source keys.

Generalise the association, so that even straight moves without any changes have their sources added as a key without a change (neither deletion nor edit, so this becomes an optional associated value).

This allows potential move destinations to be tied back to their sources by looking at the relevant ambiguous matches and seeing if a corresponding key exists in the association.

sageserpent-open commented 1 month ago

Issue #91 raised this:

The assertion occurs after CodeMotionAnalysis.of has completed; it is in CodeMotionAnalysisExtension.merge after the per-file merges have completed.

As such, it can be picked up as a prelude to working on #90.

For now 78c6ba0 disables that postcondition so that the third point from the previous comment can be addressed...

Debugging this assertion reveals a malformed MoveDestinations that contains a single destination on the right hand side without any source:

MoveDestinations(Set(),Set(),Set(Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 629, characterOffset = 48),
  onePastEnd = TextPosition(line = 636, characterOffset = 14),
  startTokenIndex = 2541,
  sizeInTokens = 18,
  content summary = """).get
            ),
             ... (
            contentsByPath = Map(
              """
)),Set())

This is referenced in MoveDestinationsReport.moveDestinationsByMatches with the key:

LeftAndRight(Section(
  label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 736, characterOffset = 61),
  onePastEnd = TextPosition(line = 742, characterOffset = 12),
  startTokenIndex = 3543,
  sizeInTokens = 18,
  content summary = """).get),
           ... (
          contentsByPath = Map(
            """
),Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 629, characterOffset = 48),
  onePastEnd = TextPosition(line = 636, characterOffset = 14),
  startTokenIndex = 2541,
  sizeInTokens = 18,
  content summary = """).get
            ),
             ... (
            contentsByPath = Map(
              """
))

So what happened to the left hand side of the left-right match?

To start with, consider the right - that was picked up as a right edit, being part of a coalesced run:

17:23:13.584 [io-compute-3] DEBUG c.s.kineticmerge.core.merge$ -- Right edit of Section(
  label = "BASE: 1e55332f",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 666, characterOffset = 55),
  onePastEnd = TextPosition(line = 669, characterOffset = 33),
  startTokenIndex = 3266,
  sizeInTokens = 10,
  content = """expected)))

        mergeResult match
          case FullyMerged(result"""
) into Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 629, characterOffset = 48),
  onePastEnd = TextPosition(line = 636, characterOffset = 14),
  startTokenIndex = 2541,
  sizeInTokens = 18,
  content summary = """).get
            ),
             ... (
            contentsByPath = Map(
              """
), coalescing with following insertion of Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 636, characterOffset = 14),
  onePastEnd = TextPosition(line = 636, characterOffset = 47),
  startTokenIndex = 2559,
  sizeInTokens = 6,
  content = "renamedPath -> tokens(leftContent"
) on the right.

The left hand side of the match however was picked up as an unrelated coincident insertion:

17:23:13.594 [io-compute-3] DEBUG c.s.kineticmerge.core.merge$ -- Coincident insertion of Section(
  label = "OURS: 5dbb4faca2a2df4351c9f4892da59601913489f3",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 736, characterOffset = 61),
  onePastEnd = TextPosition(line = 742, characterOffset = 12),
  startTokenIndex = 3543,
  sizeInTokens = 18,
  content summary = """).get),
           ... (
          contentsByPath = Map(
            """
) on the left and Section(
  label = "THEIRS: 7b4fd68698f8ed6b13b067cf87f6abb25db52da1",
  path = /Users/gerardmurphy/IdeaProjects/kineticMerge-dogfood-examples/src/test/scala/com/sageserpent/kineticmerge/core/CodeMotionAnalysisExtensionTest.scala,
  start = TextPosition(line = 832, characterOffset = 39),
  onePastEnd = TextPosition(line = 839, characterOffset = 8),
  startTokenIndex = 3279,
  sizeInTokens = 18,
  content summary = """).get
      ),
       ... (
      contentsByPath = Map(
        """
) on the right.

sageserpent-open commented 1 month ago

Regarding that last comment, it seems that not only can preservations be accidentally picked up by the move destination machinery, so can coincident insertions. Now it can be correct for a coincident insertion to form a move destination - a coincident move, to be specific, but here we have some unrelated right edit that is ambiguous as a match with the one constituting a plain coincident insertion that in itself is a degenerate move.

It doesn't seem right to consider the right edit as being a move destination when the other end of the match has been claimed by the coincident insertion.

A further oddity is why degenerate moves are even modelled - if there is no source for the move, we would not consider a left- or a right-insertion is being a degenerate move, so why consider coincident insertions (or edits) as defining a degenerate move if no source can be found? We can't propagate changes this way, nor can we anchor. What was the point of this?

sageserpent-open commented 1 month ago

As of 3d3ede67e3afb28471ac194c8177f5424532a95d, MoveDestinations has been cut over to not allowing degenerate moves in its model, so all moves must have a source. This means that the coincident insertion and the right edit discussed amove are no longer considered to be moves, as neither have a source.

The pesky invariant and then later post-condition mentioned in #91 has been removed altogether - there is no longer anything to police, as the invariant in MoveDestinations has been tightened to ensure there is always at least one source location.

The problematic merge from #91 now goes through - albeit very slowly, its performance is not in the purview of this ticket.

So back to main thrust of this ticket...

sageserpent-open commented 1 month ago

Now have a failing test with a preservation being mixed up with a genuine move in 5735cc4ac28427ee03e6d6745b37b90dc759ab2d...

sageserpent-open commented 1 month ago

TODO:

~~Remove the now redundant folding over match destinations from the logic populating the changes to migrate, all we need now is the source (which is already in there).~~ DONE.
~~Add in support for tracking a move source that has no associated changes.~~ DONE.
Cut over CodeMotionAnalysisExtension.merge accordingly and gets its tests (and those in MainTest) passing again.

sageserpent-open commented 3 weeks ago

As of 73c3260bd55ec89aef4522c66fe4a96cc62edb55, all the tests pass bar CodeMotionAnalysisExtensionTest.codeMotionAmbiguousWithAPreservation.

There is also some weirdness in how the move destinations report is populated, where there are multiple move destinations that share the same sources. I suspect this has been the case for some time; this is due to the code in MatchesContext working back from a destination element to the matches that involve it - that's fine in itself, but means that if a source or sources gets moved to multiple destinations on the same side, then each destination will possess its own match (or ambiguous matches) that in reality are ambiguous with the other destination's matches too.

Given how MoveDestinations tries to fuse divergent destinations as well as coincident destinations into the same MoveDestinations instance sharing a common source or sources, any partitioning seems wrong...

sageserpent-open commented 3 weeks ago

Yes, looking on the main branch at c0087160df5be776af04479067b519b0e4525236 reveals the same behaviour - move destinations sharing the same sources are being partitioned. Let's fix that on the side and merge in...

sageserpent-open commented 3 weeks ago

Trying to fix the partitioning of move destinations sharing the same sources leads to 8b1217f125919e3b56fe8287ee38b546dca5cd2c. This introduced a breaking invariant and then fixes that, but has broken other tests.

Merging that work with the branch for this ticket also changes the outstanding test failure in CodeMotionAnalysisExtensionTest.codeMotionAmbiguousWithAPreservation to something else.

Need to stop and think about this all...

sageserpent-open commented 3 weeks ago

Currently looking at 064605638faf519d2fbf694f74b18cab62fc44a1, which is the merge of both branches, namely changes (and straight moves) being associated with sources, and the move destinations report being subject to an invariant that distinct move destinations instances cannot share sources.

What's gone wrong on the second branch is this code that works out the matches that form a key to a move destination.

The idea was to get back to the sources and then fan back out to grab all the matches, thus avoiding partitioning. The problem is that the fan-out pulls in all matches, and these aren't necessarily genuine code motion. This is a similar problem to the original one that motivated this ticket, only this time it's occurring in the code that notes the move destinations, as opposed to noting the sources of the moves and any content that might need to be migrated!

sageserpent-open commented 3 weeks ago

What I've learned so far...

It makes sense to prevent a preservation from being the source or destination of a move - the same code is present on the base, left and right, so how can it go off to or arrive from somewhere else?
Coincident insertions don't in themselves define a degenerate move - there has to be at least one source somewhere. In fact there shouldn't be a degenerate move at all.
There is no point in having multiple move destinations instances that share the same source or sources. The idea is to condense all moves into one move destinations instance. Although, what about content that is migrated with the moves?
A left-, right- or coincident-insertion/edit may define the destination of a move - but only if a suitable source or sources can be found. The existence of a match or matches involving the inserted element is necessary for a move, but not sufficient. It may be the case that the match is ambiguous with a preservation and nothing else, in which case the insertion/edit is freestanding.
Calls to MoveDestinationsReport.leftMoveOf/rightMoveOf/coincidentMoveOf may define the source of a move - but only if a suitable destination or destinations can be found. The existence of a match or matches involving the 'moved' element is necessary for a move, but not sufficient. It may be the case that the match is ambiguous with a preservation and nothing else, in which case we're seeing a deletion of some sort. (BTW - those method names are misleading anyway. Even in the code on the main branch, they are called speculatively to pick up potential moves.)
Organising migrated changes by source makes the code easier to deal with.
If there is at least one move destination on a given side (so, we don't count preservations), then it is not possible for that side to contribute a migrated edit or deletion. This deals with straight moves on just one side, but also means that divergent or coincident moves can't pick up edits at all. However, a plain move requires resolution of the three forms of the moved element from base, left and right - regardless of whether it manifests at the move destination as a left-, right- or coincident-insertion/edit.

sageserpent-open commented 2 weeks ago

As of d1973bb25fbbea61ba6f8359f04a7ad42b74ec55, got a work-in-progress implementation that passes all tests in MainTest but fails several in CodeMotionAnalysisExtensionTest, including the all-important CodeMotionAnalysisExtensionTest.codeMotionAmbiguousWithAPreservation.

However, that last test does show the correct behaviour of keeping preservations and moves separate as regards edit / delete migrations, so this is progress.

What is bad is:

It is apparent (and this has been the case before work was started on this ticket) that coincident insertions and edits that are move destinations haven't been resolved taking the source of the move into account for trivial edit resolution - instead just the left and right contributions are used. That two-way resolution is fine when coincident insertions and edits are standalone, but is lacking when they are coincident move destinations.
The insertion migration machinery - specifically the bit that looks for anchors is very sloppy, using matches to find source to then find move destinations. Ideally, MoveDestinationsReport.evaluateSpeculativeSourcesAndDestinations should spoon-feed the anchors to the insertion migration machinery, as it has a much better idea of what's going on. That would lead nicely to #83 too.
The work done in #32 and refined in #85 has been broken by some rather nasty hacking done in this ticket. The more I look at the existing insertion migration machinery, the more I want to just leave things a bit broken and segue from this ticket directly into #83 to avoid having to fix the mess in the first place!

What is good is:

Issue #105 might fall out this for free.
The first bad thing might be straighforward to solve as there is now a place in the code that can perform the three-way resolution. The fun part is that the core merge algebra used by MergeResultDetectingMotion's delegating merge algebra has already gone and done a two-way resolution, which makes the substitution process more complex. If only the merge could accommodate unresolved coincident entries...

sageserpent-open commented 2 weeks ago

Regarding that last point about coincident entries being represented in unresolved form, perhaps we could represent then as insertion versus insertion conflicts or edit versus edit conflicts in the implementation of MergeResultDetectingMotion?

If so, that would allow three-way resolution to be done in evaluateSpeculativeSourcesAndDestinations.evaluateSpeculativeSourcesAndDestinations.

I'm not sure whether to add a pair of substitutions for the left- and right-side contribution to the three-way resolution there, and then to add additional pairs of substitutions referring to the two-way resolution for all the speculative coincident merge destinations that didn't turn out to be moves after all?

This may need breaking out into its own ticket...

sageserpent-open commented 2 weeks ago

Finally got the dreaded CodeMotionAnalysisExtensionTest.codeMotionAmbiguousWithAPreservation to pass as of 6a45339e22b04ca9471fea9c1aca58b1024d0f8f, but there straggler test failures in both MainTest and CodeMotionAnalysisExtensionTest, concerning insertion migration.

The implementation is a bit hacked, and evaluateSpeculativeSourcesAndDestinations.evaluateSpeculativeSourcesAndDestinations should probably be recast to start with the sources to find matches, whittling these down against the destinations. That would allow a cleaner treatment of coincident move destionations.

There is still the outstanding resolution work to finesse, but that can go into its own ticket.

The question is, given how quintessentially nasty the insertion migration code is, is it worth making a final push on this ticket to get the tests passing, or should this segue into #83?

sageserpent-open commented 2 weeks ago

The story so far as of 99e73229c2d5c2b68b25cd1e17cf820744f9fc63.

Got MainTest to pass in its entirety, but CodeMotionAnalysisExtensionTest is falling down on the tests that have intra-file versus inter-file code motion with associated edits. The idea was for the insertion migration to pick up the edit instead and then suppress the original edit migration, at least I think that is what CodeMotionAnalysisExtensionTest.furtherMigrationOfAMigratedEditAsAnInsertion relies on.

It seems that the idea of evaluating speculative content migrations by source against speculative move destinations is sound in itself, but this doesn't play well at all with the insertion migration. This may be down to the way that divergent moves are dealt with (because we have an intra- versus inter-file move across the left- and right-side), or perhaps the awkward treatment of the speculative conflicts where a speculative edit migration is picked up isn't playing well with insertion migration.

Another gotcha lurking in the wings as how coincident insertions / edits are dealt with - resolution is applied on the fly in a desparate attempt to conform to MergeResult wanting to have sequences of plain elements in it. There is no way of representing two or three elements in unresolved form in the merge result, and I'm starting to think this would be much better when #41 is worked on, because that won't just choose a section when resolving them, it will synthesize one to pick up the whitespace adjustment. That won't play well with the existing code's approach of searching-and-replacing sections to perform edit substitution, as well as suppressing the migrated edit's original location (and migrated insertion's original location too), as well as suppressing destinations for migrated edits competing with an insertion migration.

Similarly, conflicts are a bit hokey - each side gets its own linear sequence of sections, but we know from the conflict handling in the merge algebra exactly where the conflicting runs are. This isn't quite the same as handling unresolved elements, but again feels messy.

Finally, I wonder whether the merge algebra from MergeResultDetectingMotion should be split into two algebra implementations, one that drives evaluation of moves, and another that forwards on to the core merge algebra. The idea is to perform the per-file merges, evaluate the moves on aggregate, then tell the forwarding merge algebras to delegate to the core merge algebra, but with the full picture of the evaluated moves. This brings us back in spirit to the old merge algebra in MatchesContext that could make binding decisions as to whether to record a conflict as is, or as being resolved. That all worked nicely with the old code!

sageserpent-open commented 2 weeks ago

As of ec9fef6afcf3f78b01ef768cf63a47463fb100fd, there is just one test case failing in CodeMotionAnalysisExtensionTest.codeMotionAcrossAFileRename, and that concerns this situation:


// BASE - ORIGINAL FILE.
protected val heyDiddleDiddleInModernForm: String =
  """
    |Hey diddle diddle,
    |The cat and the fiddle,
    |The cow jumped over the moon;
    |The little dog laughed
    |To see such sport,
    |And the dish ran away with the spoon.
    |""".stripMargin

// LEFT - RENAMED FILE.
protected val heyDiddleDiddleInPsychoticForm: String =
  """
    |Hey diddle diddle,
    |The cat and the fiddle,
    |The cow, laughing madly, ran away with the spoon and then jump'd over the moon;
    |The little dog laugh'd
    |To see such craft,
    |And the fork ran away with the spoon.
    |""".stripMargin

// RIGHT - ORIGINAL FILE.
protected val heyDiddleDiddleWithIntraFileMove: String =
  """
    |Hey diddle diddle,
    |The cat and the fiddle,
    |The cow jumped!
    |The little dog laughed
    |To see such sport,
    |And the dish ran away with the spoon over the moon.
    |""".stripMargin

// EXPECTED - RENAMED FILE.
protected val heyDiddleDiddleInPsychoticFormExpectedMerge: String =
  """
    |Hey diddle diddle,
    |The cat and the fiddle,
    |The cow, laughing madly, ran away with the spoon and then jump'd!
    |The little dog laugh'd
    |To see such craft,
    |And the fork ran away with the spoon over the moon.
    |""".stripMargin

What the merge comes up with is:

Hey diddle diddle,
The cat and the fiddle,
The cow, laughing madly, !
ran away with the spoon over the moonand then jump'd !
The little dog laugh'd
To see such craft,
And the fork ran away with the spoon over the moon.

sageserpent-open commented 2 weeks ago

The thing is, bar the duplication of the !, the actual result seems better than then expected one - the text over the moon is inserted after ran away with the spoon, and that anchor is moved on the left as an inter-file move, but with two destinations.

So why shouldn't the inserted text be migrated to the two anchor move destinations? Yes, it is part of a divergent insertion on the left and right, but so what? We no longer model such things as being interesting as they have no source nor associated migration content, so they are not considered special.

The treatment of the ! is part of the general hokum that is the existing insertion migration implementation.

It's time to admit this has gone far enough.

sageserpent-open commented 2 weeks ago

The oustanding failing test case has been disabled, closing this to continue the work in the context of #83.