Closed sageserpent-open closed 7 months ago
A subtlety here that #13 didn't have to worry about: because code motion analysis is applied across all changed files on all three sides of the merge, then there is the possibility that edits / deletions may be propagated either from or to files that are changed on our or their branch that don't have corresponding changes on the other side.
So in a conventional merge (and Kinetic Merge too prior to this ticket being delivered), these would be changes already present in our branch (so nothing to do) and changes that would simply be imported without three-way merging from their branch.
Now we have to consider that both of these one-side-only-changed files will feed into the code motion analysis along with the ones that definitely require three-way merging, and that these same files may need to be altered with propagated edits / deletions.
NOTE: for now, we draw a line at files that have changed / been deleted / added wrt to the base. We don't submit unchanged files to code motion analysis, so if a change duplicates a section that would be matched in an unchanged file, we ignore that potential match as spurious.
Thinking ahead about tickets #3 and #4, there is existing code in Kinetic Merge that generates a Git-compliant conflict for when a file is updated on one side and deleted on another - and that is how the situations described on those tickets are currently handled from the point of view of the original file being renamed and / or moved.
Assuming an edit on our side, and a rename / move on their side, then with inter-file code motion supported, the edit section will be propagated to the destination file along with all of the other sections, leaving the original file as being empty in its merged form.
If there are insertions of new text into the original file, or move destinations, then these won't be propagated and thus the merged form of the original file will contain just these insertions / move destinations.
That's OK for now for insertions; we have #30 to cover this eventually.
It is the desired behaviour for move destinations; if someone decided on our side to transplant some code into the source of a move on their side, we should honour both moves, so the code at the source of their side's move goes out and the code whose destination is of our side's move comes in.
So when the dust settles, we could alter the Git our update-versus-their-delete conflict so that the stage entry for the update is the merge result for the original file and let the user sort it out - they can decide what to do with any insertions / move destinations from our side now left in the original file's merge.
There is an optimisation to this, in that if the merge of the original file is empty, then we may as well just trivially resolve the update-versus-delete as being a straight deletion. That would emulate what Git does when it propagates a simple edit through a rename. Not sure if this belongs to ticket #3, though - perhaps it should be left out for now...
A sketch of the implementation:
CodeMotionAnalysisExtension
should be cutover so that its merge
method works over all the files across all three sides of the merge. This includes files deleted from the base, as these may have sections matching in moved / renamed files. (BTW: suppose a file in the base is moved / renamed on one side and deleted on the other? Do we propagate the deletion of the sections?)merge
method needs to run three-way merges for each file and build up an aggregated mapping of sections to their propagated edits / deletions. It then needs to apply these to the results of the merges, after aggregation.Main
needs to gather up all the changes and feed them off to CodeMotionAnalysisExtension.merge
. What should result is a mapping of file paths to their merge results.What needs merging, and what needs to have edit / deletion propagation applied?
What to do with the merge results is up to Main
- this is where the knowledge of how to interact with Git lies. So if a file added or modified on just our side comes out the same, it won't be added to the Git index, but if propagation has caused changes, then the index has to be updated. The same holds if a file is added or modified on just their side - but even if the index is not updated, we have to make sure a merge commit is generated, this is similar to the delete versus delete situation already handled on the current codebase.
Where a file is modified on one side and deleted on another, we use the merge result to make the index stage entry for the modified form. This could be trivially resolved if the merge result is empty to a straight deletion.
As of Git commit SHA: 6d80a7da6ad9e1bfe0e2c8f1ce7d80f31f17d6d5, have added CodeMotionAnalysisExtension.merge
and it almost passes CodeMotionAnalysisExtensionTest.codeMotionWithSplit
.
(Main
is still yet to be cut over.)
Said commit has a manually reduced form of the failing test - there is something awry with the core merge.of
implementation whereby it is not recognising a left edit versus a right deletion conflict at the source of a move, rather it plumps for a coincidental deletion followed by a left insertion and a right deletion. This then causes a deletion to be propagated through the move rather than an edit.
This happens multiple times in the merge, and culminates in a final left edit (which is valid) - it is interesting to note that when the coincidental deletion is inferred, the merge is aware of that final left edit.
NOTE: the right deletion is due to a match between the base and left - so it is valid in its own right.
Need to test this in MergeTest
and work out what's going on - has this been missed in MergeTest
?
As of Git commit SHA: e6c6b9153d45aa0646daa6ce5f53f75e782a78ce, there are two tests to reproduce the bug described in the previous comment; the implementation has been tweaked to pass these new tests.
CodeMotionAnalysisExtensionTest.codeMotionWithSplit
now passes. The next thing is to roll back the manual test reduction to get a more realistic test...
CodeMotionAnalysisExtensionTest.codeMotionWithSplit
is back to working with the original realistic code as of Git commit SHA: 37012a5bd8638b589585c3456419835d26926925.
It fails - but the expected code motion works nicely.
What goes wrong is that the comments are mangled in odd places...
What triggers the comment mangling is the reformatting of a Javadoc comment that refers to a renamed identifier - the identifier grows longer in the rename, so reformatting breaks up two lines in the comment, inserting an asterisk at each break.
These two asterisks aren't propagated into the hived off section of code with the rest of the moved comment, and end up orphaned in front of the comment from the first method left in the interface.
See here:
Observe how this edited code looks in the merged form of the hived off file:
public interface CasesLimitStrategies {
/**
* Limits test case emission using a time budget that starts when the
* strategy is first consulted via
{@link CasesLimitStrategy#moreCasesToDo()}.
*
* @param timeBudget How long to allow a testing cycle to continue to
* emit cases.
* @return A fresh strategy instance - the time budget is not consumed
* until the first call to {@link CasesLimitStrategy#moreCasesToDo()}.
*/
static CasesLimitStrategy timed(final Duration timeBudget) {
return new CasesLimitStrategy() {
Instant deadline = null; // This edit should propagate.
@Override
public boolean moreCasesToDo() {
if (null == deadline /* This edit should propagate. */) {
deadline = Instant.now().plus(timeBudget);
}
return !Instant.now().isAfter(deadline);
}
@Override
public void noteRejectionOfCase() {
}
@Override
public void noteEmissionOfCase() {
}
@Override
public void noteStarvation() {
}
};
}
/**
* Limits test case emission using a time budget that starts when the
* strategy is first consulted via
{@link CasesLimitStrategy#moreCasesToDo()}.
*
* @param timeBudget How long to allow a testing cycle to continue to
* emit cases.
* @return A fresh strategy instance - the time budget is not consumed
* until the first call to {@link CasesLimitStrategy#moreCasesToDo()}.
*/
static CasesLimitStrategy timed(
final scala.concurrent.duration.FiniteDuration timeBudget) {
return timed(toJava(timeBudget));
}
Note the lines, one in each Javadoc block, that are missing a leading asterisk.
These asterisks are left behind in the merged form of the original file as orphans:
public interface CasesLimitStrategy {
* * /**
* Query used by the implementation of {@link Trials} to control the
* emission of new cases.
*
* @return True to signal that more cases should be emitted, false to
* stop emission.
* @apiNote Once a call returns false, there should be no further
* interaction with the strategy by the implementation of {@link Trials}
* except for additional calls to this method.
*/
boolean moreCasesToDo(); // This rename should propagate.
Analysis of the merge indicates that as the asterisks were left insertions, then the merge did the right thing with them, even though the result was unsightly.
Of course, when ticket #30 is delivered, then it should move those pesky asterisks to the right place in the hived off file.
For now, Git commit SHA: aba17e484319be63cd602c94a4e719c9e0f4bf88 tweaks the sources used by the test so as to avoid the reformatting of the Javadoc, and the test passes now. Once ticket #30 is underway, that commit can be reverted as a handy integration test...
As of Git commit SHA: 2fc04817aa3787cf03d14ca691bc59650f8d76b9, MainTest
passes all of the legacy tests. Had a test failure due to a third-party library issue, raised ticket in scala-collection-contrib. The proposed bug-fix has been made locally in a copy of the third party code.
MainTest
that exercise inter-file code motion. In particular, there is a need to show propagation of an edit into a renamed file.As of Git commit SHA: 035ee924d096dd8ac7f707cde5e5ff5f01513332, got a new test that verifies editing and deletion being propagated through a file move, and it passes.
Working on this test shows that the legacy tests may not be assiduous enough in checking the content of the merges (and this includes the new test too). Need to backpatch improvements to the test assertions and see if they fail, then merge in the latest changes to see if that fixes the improved tests' failures.
Also need a lot of additional commentary and some cleanup of Main
; the core logic is bloating up again.
TODO: check coverage.
Tickets #29, #30, #31 and #32 have been spun out of these tickets. Not sure whether to do an early release based on just this ticket or hold back for the whole lot...
What's missing from coverage:
In Main
, the deletion versus modification file merge logic doesn't exercise the part that deals with the modified file not being tweaked by the merge. This is the case both for our deletion in conflict with their modification and for our modification in conflict with their deletion.
Also in Main
, full merging of a file coincidentally added on our and their branches isn't covered.
Regarding the deletion versus modification file logic, see this ticket here: https://github.com/sageserpent-open/kineticMerge/issues/33. For now, we live with the code and behaviour as it is.
However, full merging of a file coincidentally added on our and their branches is covered since Git commit SHA: 7873fcf9e93ba641cd9e9c8f5ac1325396d26e2c.
Did some cheap-and-cheerful groundwork for ticket #31 as part of this ticket, as the more exotic tests in MainTest
were failing.
There is now a primitive exclusion of propagating changes into a destination section when the destination section is part of a merge's longest common subsequence. There is now a new command line option, --minimum-ambiguous-match-size
, that will suppress small ambiguous matches (it defaults to zero) - this is quite effective in getting rid of unwanted collisions between propagated changes.
If collisions do occur, a helpful diagnostic is shown that points to --minimum-ambiguous-match-size
.
Raised a follow-on ticket #34 from this to deal with performance. In terms of functionality, it's done.
This follows on from #13 since that ticket has had its scope cut down to just code motion inside files.
Given a Git repository of files in multiple directories that has been worked on in two branches, provide a command line utility that merges one branch into another, taking into account code motion both within the files and across them.
Any divergent code motion is treated as an error and causes the merge to be aborted - the repository should be left as it was prior to the merge being attempted.