sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
9 stars 1 forks source link

Clean Merging (including code motion across files) #26

Closed sageserpent-open closed 2 months ago

sageserpent-open commented 4 months ago

This follows on from #13 since that ticket has had its scope cut down to just code motion inside files.

Given a Git repository of files in multiple directories that has been worked on in two branches, provide a command line utility that merges one branch into another, taking into account code motion both within the files and across them.

Any divergent code motion is treated as an error and causes the merge to be aborted - the repository should be left as it was prior to the merge being attempted.

sageserpent-open commented 3 months ago

A subtlety here that #13 didn't have to worry about: because code motion analysis is applied across all changed files on all three sides of the merge, then there is the possibility that edits / deletions may be propagated either from or to files that are changed on our or their branch that don't have corresponding changes on the other side.

So in a conventional merge (and Kinetic Merge too prior to this ticket being delivered), these would be changes already present in our branch (so nothing to do) and changes that would simply be imported without three-way merging from their branch.

Now we have to consider that both of these one-side-only-changed files will feed into the code motion analysis along with the ones that definitely require three-way merging, and that these same files may need to be altered with propagated edits / deletions.

NOTE: for now, we draw a line at files that have changed / been deleted / added wrt to the base. We don't submit unchanged files to code motion analysis, so if a change duplicates a section that would be matched in an unchanged file, we ignore that potential match as spurious.

sageserpent-open commented 3 months ago

Thinking ahead about tickets #3 and #4, there is existing code in Kinetic Merge that generates a Git-compliant conflict for when a file is updated on one side and deleted on another - and that is how the situations described on those tickets are currently handled from the point of view of the original file being renamed and / or moved.

Assuming an edit on our side, and a rename / move on their side, then with inter-file code motion supported, the edit section will be propagated to the destination file along with all of the other sections, leaving the original file as being empty in its merged form.

If there are insertions of new text into the original file, or move destinations, then these won't be propagated and thus the merged form of the original file will contain just these insertions / move destinations.

That's OK for now for insertions; we have #30 to cover this eventually.

It is the desired behaviour for move destinations; if someone decided on our side to transplant some code into the source of a move on their side, we should honour both moves, so the code at the source of their side's move goes out and the code whose destination is of our side's move comes in.

So when the dust settles, we could alter the Git our update-versus-their-delete conflict so that the stage entry for the update is the merge result for the original file and let the user sort it out - they can decide what to do with any insertions / move destinations from our side now left in the original file's merge.

There is an optimisation to this, in that if the merge of the original file is empty, then we may as well just trivially resolve the update-versus-delete as being a straight deletion. That would emulate what Git does when it propagates a simple edit through a rename. Not sure if this belongs to ticket #3, though - perhaps it should be left out for now...

sageserpent-open commented 3 months ago

A sketch of the implementation:

  1. CodeMotionAnalysisExtension should be cutover so that its merge method works over all the files across all three sides of the merge. This includes files deleted from the base, as these may have sections matching in moved / renamed files. (BTW: suppose a file in the base is moved / renamed on one side and deleted on the other? Do we propagate the deletion of the sections?)
  2. The merge method needs to run three-way merges for each file and build up an aggregated mapping of sections to their propagated edits / deletions. It then needs to apply these to the results of the merges, after aggregation.
  3. Main needs to gather up all the changes and feed them off to CodeMotionAnalysisExtension.merge. What should result is a mapping of file paths to their merge results.
  4. The merge outcomes are translated into actions on the Git index, using the existing helpers to do this.
sageserpent-open commented 3 months ago

What needs merging, and what needs to have edit / deletion propagation applied?

sageserpent-open commented 3 months ago

What to do with the merge results is up to Main - this is where the knowledge of how to interact with Git lies. So if a file added or modified on just our side comes out the same, it won't be added to the Git index, but if propagation has caused changes, then the index has to be updated. The same holds if a file is added or modified on just their side - but even if the index is not updated, we have to make sure a merge commit is generated, this is similar to the delete versus delete situation already handled on the current codebase.

Where a file is modified on one side and deleted on another, we use the merge result to make the index stage entry for the modified form. This could be trivially resolved if the merge result is empty to a straight deletion.

sageserpent-open commented 3 months ago

As of Git commit SHA: 6d80a7da6ad9e1bfe0e2c8f1ce7d80f31f17d6d5, have added CodeMotionAnalysisExtension.merge and it almost passes CodeMotionAnalysisExtensionTest.codeMotionWithSplit.

(Main is still yet to be cut over.)

Said commit has a manually reduced form of the failing test - there is something awry with the core merge.of implementation whereby it is not recognising a left edit versus a right deletion conflict at the source of a move, rather it plumps for a coincidental deletion followed by a left insertion and a right deletion. This then causes a deletion to be propagated through the move rather than an edit.

This happens multiple times in the merge, and culminates in a final left edit (which is valid) - it is interesting to note that when the coincidental deletion is inferred, the merge is aware of that final left edit.

NOTE: the right deletion is due to a match between the base and left - so it is valid in its own right.

Need to test this in MergeTest and work out what's going on - has this been missed in MergeTest?

sageserpent-open commented 3 months ago

As of Git commit SHA: e6c6b9153d45aa0646daa6ce5f53f75e782a78ce, there are two tests to reproduce the bug described in the previous comment; the implementation has been tweaked to pass these new tests.

CodeMotionAnalysisExtensionTest.codeMotionWithSplit now passes. The next thing is to roll back the manual test reduction to get a more realistic test...

sageserpent-open commented 3 months ago

CodeMotionAnalysisExtensionTest.codeMotionWithSplit is back to working with the original realistic code as of Git commit SHA: 37012a5bd8638b589585c3456419835d26926925.

It fails - but the expected code motion works nicely.

What goes wrong is that the comments are mangled in odd places...

sageserpent-open commented 3 months ago

What triggers the comment mangling is the reformatting of a Javadoc comment that refers to a renamed identifier - the identifier grows longer in the rename, so reformatting breaks up two lines in the comment, inserting an asterisk at each break.

These two asterisks aren't propagated into the hived off section of code with the rest of the moved comment, and end up orphaned in front of the comment from the first method left in the interface.

See here:

Screenshot 2024-03-28 at 10 06 18

Observe how this edited code looks in the merged form of the hived off file:

public interface CasesLimitStrategies {
    /**
     * Limits test case emission using a time budget that starts when the
     * strategy is first consulted via
     {@link CasesLimitStrategy#moreCasesToDo()}.
     *
     * @param timeBudget How long to allow a testing cycle to continue to
     *                   emit cases.
     * @return A fresh strategy instance - the time budget is not consumed
     * until the first call to {@link CasesLimitStrategy#moreCasesToDo()}.
     */
    static CasesLimitStrategy timed(final Duration timeBudget) {
        return new CasesLimitStrategy() {
            Instant deadline = null; // This edit should propagate.

            @Override
            public boolean moreCasesToDo() {
                if (null == deadline /* This edit should propagate. */) {
                    deadline = Instant.now().plus(timeBudget);
                }

                return !Instant.now().isAfter(deadline);
            }

            @Override
            public void noteRejectionOfCase() {

            }

            @Override
            public void noteEmissionOfCase() {

            }

            @Override
            public void noteStarvation() {

            }
        };
    }

    /**
     * Limits test case emission using a time budget that starts when the
     * strategy is first consulted via
     {@link CasesLimitStrategy#moreCasesToDo()}.
     *
     * @param timeBudget How long to allow a testing cycle to continue to
     *                   emit cases.
     * @return A fresh strategy instance - the time budget is not consumed
     * until the first call to {@link CasesLimitStrategy#moreCasesToDo()}.
     */
    static CasesLimitStrategy timed(
            final scala.concurrent.duration.FiniteDuration timeBudget) {
        return timed(toJava(timeBudget));
    }

Note the lines, one in each Javadoc block, that are missing a leading asterisk.

These asterisks are left behind in the merged form of the original file as orphans:

public interface CasesLimitStrategy {
    * * /**
     * Query used by the implementation of {@link Trials} to control the
     * emission of new cases.
     *
     * @return True to signal that more cases should be emitted, false to
     * stop emission.
     * @apiNote Once a call returns false, there should be no further
     * interaction with the strategy by the implementation of {@link Trials}
     * except for additional calls to this method.
     */
    boolean moreCasesToDo(); // This rename should propagate.
sageserpent-open commented 3 months ago

Analysis of the merge indicates that as the asterisks were left insertions, then the merge did the right thing with them, even though the result was unsightly.

Of course, when ticket #30 is delivered, then it should move those pesky asterisks to the right place in the hived off file.

For now, Git commit SHA: aba17e484319be63cd602c94a4e719c9e0f4bf88 tweaks the sources used by the test so as to avoid the reformatting of the Javadoc, and the test passes now. Once ticket #30 is underway, that commit can be reverted as a handy integration test...

sageserpent-open commented 3 months ago

As of Git commit SHA: 2fc04817aa3787cf03d14ca691bc59650f8d76b9, MainTest passes all of the legacy tests. Had a test failure due to a third-party library issue, raised ticket in scala-collection-contrib. The proposed bug-fix has been made locally in a copy of the third party code.

  1. Next step, add some more integration tests to MainTest that exercise inter-file code motion. In particular, there is a need to show propagation of an edit into a renamed file. DONE
  2. Better yet if several edits and deletions can be propagated into a file that is split into two renames. DONE
  3. Having the condensation of code moved from two files into one would be nice too. DISABLED FOR NOW, IT NEEDS https://github.com/sageserpent-open/kineticMerge/issues/30
  4. How about mixing up the two previous cases so that code moves out into renamed splits, but other code moves in? There are variations on whether the code moves in on the side contributing the outward moves or on the editing side.
  5. A final touch: interchanging the contents of two files, with edits on either side. DONE
sageserpent-open commented 3 months ago

As of Git commit SHA: 035ee924d096dd8ac7f707cde5e5ff5f01513332, got a new test that verifies editing and deletion being propagated through a file move, and it passes.

Working on this test shows that the legacy tests may not be assiduous enough in checking the content of the merges (and this includes the new test too). Need to backpatch improvements to the test assertions and see if they fail, then merge in the latest changes to see if that fixes the improved tests' failures.

Also need a lot of additional commentary and some cleanup of Main; the core logic is bloating up again.

sageserpent-open commented 3 months ago

TODO: check coverage.

Tickets #29, #30, #31 and #32 have been spun out of these tickets. Not sure whether to do an early release based on just this ticket or hold back for the whole lot...

sageserpent-open commented 3 months ago

What's missing from coverage:

In Main, the deletion versus modification file merge logic doesn't exercise the part that deals with the modified file not being tweaked by the merge. This is the case both for our deletion in conflict with their modification and for our modification in conflict with their deletion.

Also in Main, full merging of a file coincidentally added on our and their branches isn't covered.

sageserpent-open commented 3 months ago

Regarding the deletion versus modification file logic, see this ticket here: https://github.com/sageserpent-open/kineticMerge/issues/33. For now, we live with the code and behaviour as it is.

However, full merging of a file coincidentally added on our and their branches is covered since Git commit SHA: 7873fcf9e93ba641cd9e9c8f5ac1325396d26e2c.

sageserpent-open commented 2 months ago

Did some cheap-and-cheerful groundwork for ticket #31 as part of this ticket, as the more exotic tests in MainTest were failing.

There is now a primitive exclusion of propagating changes into a destination section when the destination section is part of a merge's longest common subsequence. There is now a new command line option, --minimum-ambiguous-match-size, that will suppress small ambiguous matches (it defaults to zero) - this is quite effective in getting rid of unwanted collisions between propagated changes.

If collisions do occur, a helpful diagnostic is shown that points to --minimum-ambiguous-match-size.

sageserpent-open commented 2 months ago

Raised a follow-on ticket #34 from this to deal with performance. In terms of functionality, it's done.