Add a flag to fix past merges.

matts1 commented 1 year ago

This is a simple implementation where a commit chain A -> {B, C} -> merge commit -> D is just treated as A -> B -> C -> D for the purposes of absorb.

tummychow commented 1 year ago

instinctively i think this is unsound. what if the merge is nontrivial and contains conflict resolutions? modifying B/C (the parents of the merge) might invalidate the resolution or introduce new conflicts. have you tested such a case?

matts1 commented 1 year ago

Those sound like reasonable concerns. Here's an algorithm that should address that. If you think it sounds reasonable I might implement it, but it does seem like a lot of work.

Given a tree:

       +-- A --+
base --+       +-- merge -- C
       +-- B --+

We would treat it internally as:

       +-- A -- mergeA=$(diff A merge) --+
base --+                                 +-- C
       +-- B -- mergeB=$(diff B merge) --+

We would only absorb if:

Of the set of all commits that don't commute, there is a single most common ancestor (eg. If both A and B fail to commute, we don't absorb)
It commutes with all mergeX commits

This would mean that:

Absorbing into C is a non-issue
Absorbing into base is a non-issue because it guarantees that every other commit commutes
Absorbing into A is fine, because that would guarantee that the branch B commutes, and that the changes that we make to A would also commute with mergeA (and vice versa)

tummychow commented 1 year ago

i'm going to be honest with you: convincing me that the described algorithm is correct will be as hard, possibly harder, than actually implementing it. which is why i didn't even attempt to support merges in the current design. can you demonstrate with a worked example? like, patches with line offsets.

Of the set of all commits that don't commute, there is a single most common ancestor (eg. If both A and B fail to commute, we don't absorb)

i'm not sure i follow this sentence. if both A and B fail to commute, they have a common ancestor (base). can you rigorously define "most common"?

matts1 commented 1 year ago

Oh, sorry, most common child, not most common ancestor. Let's only work with a single file, since we only need a single file

Base:

foo

A:

+ def
+ fn
+ here
foo
+ a = ...

B:

foo
+ b = ...

merged:

def
fn
extra
here
foo
a = ...
b = ...

I'll give you a more concrete algorithm with a slight modification - some pseudocode for the algorithm in something between rust and python

def select_absorb_commit(commit, hunk, base) -> Option<Commit>:
  if commit == base:
    return None

  if len(commit.parents) == 1:
    if not commute(commit, hunk):
      return Some(commit)
    return select_absorb_commit(commit.parent, hunk, base)

  commits  = {
    parent: select_absorb_commit(parent, hunk, base)
    for parent in commit.parents
  }
  # Ignore parents of the merge commit that commuted
  filtered = {k: c for k, v in commits if v == Some(c)}
  if len(filtered) == 1:
    parent, absorb_commit = next(filtered)
    if commute(diff(commit, parent), hunk):
      return absorb_commit

  return None

In all following examples, we will be writing f(commit) as a shorthand for select_absorb_commit(commit=commit, hunk=<hunk>, base=base)

Examples: Hunk containing changes between def and fn:

f(merged)
  f(a) -> a
  f(b) -> None
  filtered = {a -> a}
  commute(diff(merged, a), hunk) -> True
  -> return a

Hunk containing changes between foo and a = ...

f(merged)
  f(a) -> a
  f(b) -> b
  filtered = {a -> a, b -> b}
  len != 1 -> return None

Hunk containing changes between fn and extra:

f(merged)
  f(a) -> a
  f(b) -> None
  filtered = {a -> a}
  commute(diff(merged, a), hunk) -> False
  -> return None

tummychow commented 1 year ago

hmmm, okay. this looks like something i could see myself accepting. i won't take this pr as-is, but you're welcome to start hacking on the algorithm (or throw your hands up in the air and say "nah it's too much work", in which case i'll close this pr and move the description to a new issue).

i think the only detail that's missing from the pseudocode is that all merge parents could return the same result (eg filtered = {a -> base, b -> base}) if the stack was deep enough to include their git-merge-base. that might be simple as replacing len(filtered) with len(unique(values(filtered))) and then iterating across all the merge parents, or perhaps it ends up being more complicated.

devinrhode2 commented 1 year ago

Maybe this is answered in the pseudo-code, but I'm curious exactly how you'll decide the order of the two commits that are being merged.

Naively, I think the commit with the older commit date could go first. I think commit date is better than author date because it's a more likely indication on conflict resolution.

A commit that was authored 12 months ago and committed 12 months ago is more likely to have conflicts if applied on top of a commit made 2 days ago (regardless of author date), compared to applying a commit from 2 days ago on top of a commit made 12 months ago.

Given this, we now have a reasonable heuristic for choosing which commit goes first.

The second commit is then just the diff of A and the merge commit.

matts1 commented 1 year ago

@tummychow Good catch, I hadn't considered that. Yes, it should be as simple as len(unique(values(filtered))).

@devinrhode2 I believe you're misunderstanding something, but it's hard to say exactly what without an example of what you think would happen. If it helps clarify anything, we specifically do not perform any merges, and absorb will never create a fixup unless the following git rebase --rebase-merges is guaranteed to succeed, so it's not neccesary to do anything heuristic-based, and we don't need to decide which commit goes first.

tummychow / git-absorb

Add a flag to fix past merges. #89