newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.46k stars 706 forks source link

Merge many repos into one, squashing all merge commits. #611

Open jschaf opened 3 days ago

jschaf commented 3 days ago

Hi there, I have a thorny problem I could use some pointers on:

I'm trying to convert a collection of 40 multi-repos into a monorepo. My main goals are:

  1. Order commits by the original commit time, so commits from different repos may be interspersed.
  2. Squash all merge commits into a single commit to get a linear history.

I can guarantee that paths from different repos won't collide.

For 1: It's straightforward to stack commits in repo order [A1, A2, A3, B1, B2, C1, C2]. Is it possible to order like [A1, B1, C1, B2, C2, A3]?

For 2: I could use pointers on operating git filter repo across multiple commits. The docs I've read demonstrate modifying individual commits in isolation.

me-and commented 3 days ago

What's the goal with having a linear history? If you want a list of what happened in order, you can always use git log --date-order and related commands even on a repository with a very complex commit graph.

If you were happy with that, you could create your monorepo by just creating a merge commit that merges the relavant branches on each of the smaller current repos.

If you really wanted a linear history, I suspect filter-repo is the wrong tool for the job. I'd do this by getting a list of commits in the right order with that git log --date-order command, then reformatting it to use as a command list with git rebase --interactive.

jschaf commented 3 days ago

What's the goal with having a linear history?

Sure, two reasons: Uniformity with one of the existing repos. Most existing merge commits either contain a single commit or have many bad commits like "todo", "fix it".

If you really wanted a linear history, I suspect filter-repo is the wrong tool for the job.

Gotcha. The problem is there are probably 30k commits, so I can't do it by hand. Could I operate on the fast-export format directly?

me-and commented 2 days ago

What's the goal with having a linear history?

Sure, two reasons: Uniformity with one of the existing repos. Most existing merge commits either contain a single commit or have many bad commits like "todo", "fix it".

Ah, okay! I'd thought that you wanted to keep all the non-merge commits, but have them appear linearly in the commit history. From what you're saying here, you want to only keep the first parent of each merge commit, so all the commits made on a temporary branch will get discarded. Is that right?

If you really wanted a linear history, I suspect filter-repo is the wrong tool for the job.

Gotcha. The problem is there are probably 30k commits, so I can't do it by hand. Could I operate on the fast-export format directly?

With hindsight, I didn't say what I'd intended to say here. I think what you're asking for (and what I'd previously thought you were asking for) can be achieved using filter-repo; you'd need to write a commit callback to rewrite commits to have the parents you'd want.

The alternative I was suggesting wasn't to do it by hand, but to generate the list of commits you want to keep, in order, using git log, then to use the output from that command instead of the instructions git rebase --interactive loads when you run it, and then have Git apply those commits in order. However I'm less confident that this will work if you're looking to throw away commits that aren't on the main branch. It's definitely possible, but it may require more work than just using rebase to keep all the commits but make them linear.

jschaf commented 20 hours ago

Ah, okay! I'd thought that you wanted to keep all the non-merge commits, but have them appear linearly in the commit history. From what you're saying here, you want to only keep the first parent of each merge commit, so all the commits made on a temporary branch will get discarded. Is that right?

Yes, I think so. I don't care about the individual commits in the merged branches; I only care about the overall result of applying the merge commit.

For the following history:

git log --graph --decorate --oneline
*   b7c5375457 (HEAD -> stable) Merge branch 'sre-oncall-UpdateEmployeeHandler-exception-level' into 'stable'
|\  
| * a4318e8c7c throw ArryvedExceptions -> ArryvedWarnings
* |   081d64e5e5 Merge branch 'cameron/update-codeowners' into 'stable'
|\ \  
| * | 9f9b4aeb06 AR-7183: update codeowners file
* | |   0faf9948fa Merge branch 'julien/rename-to-normalizeCardHolderName' into 'stable'
|\ \ \  
| * | | 347d6b0daf AR-7213: Online Store/OpenTab - rename function as per MR feedback
* | | |   f32ce0d517 Merge branch 'julien/ar-7189/pay-form-loading' into 'stable'
|\ \ \ \  
| * | | | a758abdceb AR-7189: Fix "flash" of NMI payment form before iFrame loads on Online Store Payment Screen
* | | | |   c13b3587d2 Merge branch 'travis/SUP-35-stay_button' into 'stable'
|\ \ \ \ \  
| * | | | | d7690c2d5e Fix for stay feature no longer working
* | | | | |   225d2eaf7b Merge branch 'cameron/AR-7210/fix-device-location-map' into 'stable'
|\ \ \ \ \ \  
| |_|_|_|_|/  

I want each merged branch to collapse into a single commit, so something like:

* Merge branch 'sre-oncall-UpdateEmployeeHandler-exception-level' into 'stable'
* Merge branch 'cameron/update-codeowners' into 'stable'
* Merge branch 'julien/rename-to-normalizeCardHolderName' into 'stable'
* Merge branch 'julien/ar-7189/pay-form-loading' into 'stable'
* Merge branch 'travis/SUP-35-stay_button' into 'stable'
* Merge branch 'cameron/AR-7210/fix-device-location-map' into 'stable'

Ideally, I could use one of the commit messages from each merged branch to get the following:

* throw ArryvedExceptions -> ArryvedWarnings
* AR-7183: update codeowners file
* AR-7213: Online Store/OpenTab - rename function as per MR feedback
* AR-7189: Fix "flash" of NMI payment form before iFrame loads on Online Store Payment Screen
* Fix for stay feature no longer working

you'd need to write a commit callback to rewrite commits to have the parents you'd want.

I'm not totally sure I follow. Is the idea something like the following?