sageserpent-open / kineticMerge

Merge a heavily refactored codebase and stay sane.
MIT License
12 stars 2 forks source link

Clean Merging (No code motion) #1

Closed sageserpent-open closed 1 year ago

sageserpent-open commented 1 year ago

Given a Git repository of files in multiple directories that has been worked on in two branches, provide a command line utility that merges one branch into another., taking into account code motion both within the files and across them.

Any divergent code motion is treated as an error and causes the merge to be aborted - the repository should be left as it was prior to the merge being attempted.

Any conflict is treated as an error and causes the merge to be aborted - the repository should be left as it was prior to the merge being attempted.

Nothing fancy is done with whitespace - if it causes a conflict, too bad.

Successful completion should commit a standard Git merge commit with two parent commits.

No user intervention is required - it either succeeds and commits, or fails and rolls back.

sageserpent-open commented 1 year ago

The actual merge should be done using git merge-file, supplied with temporary files for the relevant code.

sageserpent-open commented 1 year ago

To clarify the phrase from above: "the repository should be left as it was prior to the merge being attempted", the assumption here is that the working directory tree started off with no uncommitted changes (staged or not) prior to attempting the merge.

If that isn't the case, the merge isn't even started - having a clean working directory tree is a precondition for the merge.

So if an actual rollback is needed on the filesystem, it can be implemented by a simple checkout of the commit in force prior to attempting the merge.

sageserpent-open commented 1 year ago

Reducing the scope for this ticket, as it's taking a long time to implement. It would be good to get the big picture of the code organisation (notably, the application shell) delivered.

Just delivering a tool that does conventional merging is enough to get some serious implementation work out of the way:

All of these are needed to support code motion, but can be delivered earlier in a tool that at least should provide an alternative to Git's merge.

It would be interesting to compare the results of this ticket with plain git merge to see if there is any (dis)advantage in using Kinetic Merge for basic merging.

I had considered using all the merge machinery purely to detect code motion and then let git merge-file do the final merge for the sake of consistency between clean merges and ones that need intervention for conflict resolution, as per #2. However there might be value in using the merge machinery as far as possible without involving git merge; if there are conflicts, the stage entries could be synthesized from the partial merges.

That was always the plan for code motion - git merge-file would be presented with synthetic left and right files that incorporated edits into code motion, so there is no harm in thinking about merging with synthetic left and right files.

Having an early drop of the tool with the merge machinery used to its full extent provides early feedback as to whether git merge-file is needed at all.

sageserpent-open commented 1 year ago

Got merging of sections working without tracking code motion. The next step was going to be to implement CodeMotionAnalysis, but by generalising the API of Merge to apply over arbitrary sequences instead of just sequences of Section we can apply it directly to the file contents without having to build sections.

Let's try that - a bit more work to generalise the API, but it avoids having to implement CodeMotionAnalysis in this ticket, and that will be a hard job.

sageserpent-open commented 1 year ago

Turned out that even doing a straightforward merge without sections / code motion was not so simple for real-world text.

I now know that LCS really doesn't scale well for character-level sequences, and whitespace has a nasty way of grabbing the attention of the LCS algorithm, providing a spurious contribution to the least common subsequence.

For example, take the three sentences:

BASE: "Hurricanes hardly ever happen in Hampshire", LEFT: "A bird in hand is worth two in the bush." RIGHT: "All's well that ends well."

Not only is performance dreadful when done at the character level, but the merge itself is terrible - do we really want to match on the 'i' in Hurricanes and bird? People see a sentence as a sequence of words, and source code as a sequence of keywords, identifiers and various delimiters, operators etc.

Going to a word-based approach, there is not much in common from the point of view of the words, which is what we're interested in - only in is shared between the base and left inputs - but the whitespace between the words makes a spurious LCS that breaks up what are essentially two conflict sections - the one before in and the one after.

Taking this on board:

sageserpent-open commented 1 year ago

The recent changes have been done as a spike in MergingTextTest. These need to be broken out and put into the non-test code folders.

As an aside, the current tokenisation implementation uses the Scala parser combinators library - the tokenisation is a bit clunky, but is by no means the performance bottleneck in a merge. If it ever gets to the point where tokenisation needs a performance boost, consider the RE2 library, trading the use of grammar productions to assemble 'super-tokens' out of more primitive tokens furnished by simple regular expressions for a more sophisticated use of regular expressions with context to build tokens directly.

Not sure if that approach is workable or even performant, but it is a possibility.

sageserpent-open commented 1 year ago

Also need to remove CodeMotionAnalysis in a nice revertible commit prior to shipping the first cut of Kinetic Merge, as it is just dead code right now.

sageserpent-open commented 1 year ago

Git cribsheet for use in application shell:

sageserpent-open commented 1 year ago

Shell outline:

  1. Modify / delete conflict leads to staging conflicted entries for the modified file and the deleted one.
  2. Raise a severe error and abort if there is an add / delete conflict - this should not happen.
  3. Raise a severe error and abort if there is an add / modify conflict - this should not happen.
  4. Add / add leads to merging for that path.
  5. Modify / modify leads to merging for that path.
  6. Delete / delete leads to straight deletion, treat this as a no-op.
sageserpent-open commented 1 year ago

Left unsaid so far is whether Kinetic Merge should complete the merge if no conflicts are detected. Probably best to go with with git merge does and allow a --no-commit flag to control whether the merge is left uncommitted for further review and editing. By default, the merge will be committed, and a further --commit flag can be used to override an existing --no-commit.

sageserpent-open commented 1 year ago

What do do about file permissions that have changed? Especially if they have changed on both branches?

Propagating a file permissions change made on just one branch is an obvious choice, but if there is a conflict, aborting the merge seems drastic.

Perhaps the staged conflict entries should be given the conflicting file permissions, and Git (or a third party merge) can sort it out. This means that if there are no conflcits in terms of content changes for a given file, then Kinetic Merge has to make fake conflicting entries that have the same content but different permission modes.

As for the file representing a the partial merge of the file, this is given whatever permissions git merge-file comes up with.

sageserpent-open commented 1 year ago

Git commit SHA: f2ea3c2f137fcfdc482b12a901900d721c3d6027 implements the above and has been tested manually on a toy repository.

In addition, if our branch contains their branch, this is treated as a successful no-operation. If their branch contains ours, ours is fast-forwarded to theirs as a successful operation.

If an unexpected error occurs, such as a Git command not working (contrast with when the merge operation fails in its own right, ie. conflicting edits or file addition versus file deletion conflicts), then the repository is rolled back to its initial state. The code for this is pretty hokey, but it does its job.

sageserpent-open commented 1 year ago

What's left to do?

  1. Provide some feedback from the tool to the user other than just the exit code! DONE
  2. Add some automated tests to prove the application shell works properly. DONE
  3. Consider cutting over to something like Cats Effect in the shell code to get a nicer rollback. DONE
  4. Try out the packaged executable on some real code. TRIED OUT ON A SINGLE SCALA EXAMPLE!
sageserpent-open commented 1 year ago

Time to release an executable and see if it runs on some random computer somewhere, downloading off Maven Central. I think the release ceremony diserves another ticket, as folks can build this locally for beta testing...

sageserpent-open commented 1 year ago

Evidence of a successful merge, including a file with parallel changes ....

Screenshot 2023-10-02 at 18 57 42

Git merge requires manual intervention:

Screenshot 2023-10-02 at 19 00 29

Commands executed (merge commit from Kinetic Merge was dropped from the branch via IntelliJ):

Screenshot 2023-10-02 at 19 02 41