hraban / tomono

Multi- To Mono-repository merge
https://tomono.0brg.net
GNU Affero General Public License v3.0
842 stars 138 forks source link

What To Do With Merged Branches #55

Open C-Howard opened 1 month ago

C-Howard commented 1 month ago

Specs

OS: M3 Mac running Sonoma 14.5 Terminal: iTerm2 Git: 2.45.2 Tomono Commit: https://github.com/hraban/tomono/commit/8052bd85bfc88be859653dec90ad3186b2d17428

Issue

I am working on getting a series of older packages all consolidated to the new folder. That has worked correctly, however when I git push --all to the repository all the branches besides master show something like 800 commits behind, 4 commits ahead. For example there is a project-cleanup branch that shows 795 commits behind and 4 ahead with those commits being the following:

Move shared auth project files to auth 4.7 project, Remove old project
Move all files to Sdk/Auth/
Root commit for monorepo branch ch/project-cleanup
Merge auth/ch/project-cleanup

The first commit is the from the auth-sdk repository used in the script while the other 3 come from tomono. Is this behavior expected? I have verified that my default branch setting in git matches the branch used across all the repositories master and the master branch shows all expected folders.

My tomono script is the following:

#!/usr/bin/env bash
export MONOREPO_NAME=redacted-packages

./../tomono <<EOF
git@github.com:redacted/Common-Package.git Common
git@github.com:redacted/IoC-Package.git IoC
git@github.com:redacted/Mvc-Package.git Mvc
git@github.com:redacted/media-sdk.git media Sdk/Media
git@github.com:redacted/metrics-sdk.git metrics Sdk/Metrics
git@github.com:redacted/reporting-sdk.git reporting Sdk/Reporting
git@github.com:redacted/auth-sdk.git auth Sdk/Auth
git@github.com:redacted/Cache-Package.git Cache
git@github.com:redacted/Testing-Package.git Testing
EOF

And I have tested a version with no subfolders:

#!/usr/bin/env bash
export MONOREPO_NAME=redacted-packages

./../tomono <<EOF
git@github.com:redacted/Common-Package.git Common
git@github.com:redacted/IoC-Package.git IoC
git@github.com:redacted/Mvc-Package.git Mvc
git@github.com:redacted/media-sdk.git media
git@github.com:redacted/metrics-sdk.git metrics
git@github.com:redacted/reporting-sdk.git reporting
git@github.com:redacted/auth-sdk.git auth
git@github.com:redacted/Cache-Package.git Cache
git@github.com:redacted/Testing-Package.git Testing
EOF

With the same results.

hraban commented 1 month ago

What remote are you using? The tomono script creates an entirely new repository, but it sounds like you're reusing an existing remote. I'm not sure what to even expect from that setup, so I don't know what to say other than "create a new repository on github and push to that, instead".

Git is saying there's already a project-cleanup branch on your remote with commits which aren't present in the current monorepo, but it's hard to analyze what those commits were when there usually isn't even supposed to be an existing remote in the first place.

C-Howard commented 1 month ago

I probably have made the process more complicated by accident then. I originally created a totally new repository, added a new origin remote, and did git push --all to the repository. It pushed up but chose a different default branch than master. I switched it on the Github side but it seems to be out of sync after that.

Whenever I would try again I would delete all existing branches besides master as it is protected by Github and git push --all -force to replace all branches. All the branches seem to be getting recreated each time and master includes all the changes.

Ill try making a new repository again and see what happens. Current steps:

  1. Create folder for packages with the script
  2. Run mentioned script
  3. Run git remote add origin {url}
  4. Run git push --all
C-Howard commented 1 month ago

I tried eliminating a few potential issues, I deleted the old repository, and created a totally new one. I also switched to running it the way recommended in the README instead of passing lines through stdin.

repos.txt

git@github.com:redacted/Common-Package.git Common
git@github.com:redacted/IoC-Package.git IoC
git@github.com:redacted/Mvc-Package.git Mvc
git@github.com:redacted/media-sdk.git media
git@github.com:redacted/metrics-sdk.git metrics
git@github.com:redacted/reporting-sdk.git reporting
git@github.com:redacted/auth-sdk.git auth
git@github.com:redacted/Cache-Package.git Cache
git@github.com:redacted/Testing-Package.git Testing

run.sh

#!/usr/bin/env bash
export MONOREPO_NAME=redacted-packages

cat repos.txt | ./../tomono/tomono

I have tried running the script directly / line by line in my terminal and I see 634 behind, 4 ahead still. I made sure to run the script from a folder that is not a git repository so it won't conflict in any way. I also made sure that the branch it is trying to merge is only 1 commit ahead on the auth remote.

To make it easier to debug the steps I followed were:

  1. Create new destination repo
  2. Run the script ./run.sh
  3. Cd to redacted-packages
  4. Add new remote origin git remote add origin {url}
  5. Push master branch git push origin master
  6. Push project cleanup branch git push origin project-cleanup
hraban commented 1 month ago

Sorry I just now realize we may be talking about a different UI: what exactly is telling you the "behind" part? I thought it was github but maybe it's a local tool?

C-Howard commented 1 month ago

Sorry for the delay in replying, it is actually coming from Github. Is there a way to help debug these issues further?

Tomono Screenshot
hraban commented 1 month ago

Ok but that makes sense, right? At this point I'm not sure what the bug is. What did you expect to see? master will contain commits that aren't present in the feature branches, namely those of the new repos, so it makes sense github will show that the branch is behind compared to master.

C-Howard commented 1 month ago

That make sense, I didn't see anything in the readme about what to expect the merged branches to look like. The merged branches are intended to be for reference not to be mergable after the process is run? This seems to be a similar but different issue than #21 so I wanted to clarify.

Is there a recommended way to handle these branches in that case? Rebasing off of master to remove the extra commits or anything like that?

hraban commented 1 month ago

Have a look at this blog post see if that helps clarify things: https://syslog.ravelin.com/multi-to-mono-repository-c81d004df3ce lmk if it's still unclear. Also see the git log graph in the readme, and the opening paragraph (emphasis added):

This script merges multiple independent tiny repositories into a single “monorepo”. Every original repo is moved into its own subdirectory, branches with the same name are all merged. See Example for the details.

hraban commented 1 month ago

@C-Howard if the blog helped explain something that was previously unclear, I'd love to pick your brain if you got a moment and ask what helped clarify things. You're probably not the first to be confused, just the first to actually go through the trouble of opening a ticket about it, so if I can clarify something in the README with your help that'd be great.

C-Howard commented 1 month ago

I apologize for the delay in responding, the blog is helpful for explaining all the theory and the process used which is excellent for understanding. It seems like it might be wise to mention the blog post in the readme as well. From a tool user standpoint it does not cover initial or ongoing maintenance however, while using the tool I am already past the stage the blog post covers. My concerns at this point would be how do I get my changes out of the newly merged branches in a way that is not painful.

I guess a lot of what Im asking for is a rough idea of the "Dos and Do Nots" of this approach. What types of branches you want to clean up on a repository before trying to merge it in and things like that. The most ideal setup for us would be to take all of our current package repositories and merge them together so we can move the builds to the repository for each package and archive the old ones. Right now both rebasing and merges run into rename conflicts:

git merge origin/master

CONFLICT (rename/delete): src/Auth.Net47/Auth.Net47.csproj renamed to Sdk/Auth/src/Auth.Net47/Auth.Net47.csproj in origin/master, but deleted in HEAD.
Auto-merging Sdk/Auth/src/Auth.Net47/Auth.Net47.csproj
CONFLICT (add/add): Merge conflict in Sdk/Auth/src/Auth.Net47/Auth.Net47.csproj
CONFLICT (rename/delete): src/Auth/app.config renamed to Sdk/Auth/src/Auth/app.config in origin/master, but deleted in HEAD.
CONFLICT (rename/delete): src/Auth/packages.config renamed to Sdk/Auth/src/Auth/packages.config in origin/master, but deleted in HEAD.
CONFLICT (rename/rename): src/Auth/Auth.csproj renamed to Sdk/Auth/src/Auth.Net47/Auth.Net47.csproj in HEAD and to Sdk/Auth/src/Auth/Auth.csproj in origin/master.
CONFLICT (rename/rename): src/Auth/Helpers.cs renamed to Sdk/Auth/src/Auth.Net47/Helpers.cs in HEAD and to Sdk/Auth/src/Auth/Helpers.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AccessToken.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AccessToken.cs in HEAD and to Sdk/Auth/src/Auth/Models/AccessToken.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AccessTokenAccount.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AccessTokenAccount.cs in HEAD and to Sdk/Auth/src/Auth/Models/AccessTokenAccount.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AccessTokenBase.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AccessTokenBase.cs in HEAD and to Sdk/Auth/src/Auth/Models/AccessTokenBase.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AccessTokenIdentity.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AccessTokenIdentity.cs in HEAD and to Sdk/Auth/src/Auth/Models/AccessTokenIdentity.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AccessTokenSite.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AccessTokenSite.cs in HEAD and to Sdk/Auth/src/Auth/Models/AccessTokenSite.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AccountAuthType.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AccountAuthType.cs in HEAD and to Sdk/Auth/src/Auth/Models/AccountAuthType.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AddUserToSiteResponse.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AddUserToSiteResponse.cs in HEAD and to Sdk/Auth/src/Auth/Models/AddUserToSiteResponse.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/Application.cs renamed to Sdk/Auth/src/Auth.Net47/Models/Application.cs in HEAD and to Sdk/Auth/src/Auth/Models/Application.cs in origin/master.
CONFLICT (rename/rename): src/Auth/Models/AuthType.cs renamed to Sdk/Auth/src/Auth.Net47/Models/AuthType.cs in HEAD and to Sdk/Auth/src/Auth/Models/AuthType.cs in origin/master.
...

Specifically this branch was to rename files in the repository to remove a .Net47 suffix from file names. If I attempt to merge in newer changes following the readme https://github.com/hraban/tomono?tab=readme-ov-file#pull-in-new-changes-from-a-remote it also fails. The auth origin has already had this branch merged in so I am not able to pull in those changes as is.

git merge -X subtree=sdk/Auth auth/master

CONFLICT (rename/delete): src/Auth.Net47/Auth.Net47.csproj renamed to Sdk/Auth/src/Auth.Net47/Auth.Net47.csproj in HEAD, but deleted in auth/master.
CONFLICT (rename/delete): src/Auth.Net47/app.config renamed to Sdk/Auth/src/Auth.Net47/app.config in HEAD, but deleted in auth/master.
CONFLICT (rename/delete): src/Auth.Net47/packages.config renamed to Sdk/Auth/src/Auth.Net47/packages.config in HEAD, but deleted in auth/master.
CONFLICT (file location): src/Auth.Net47/JWTService.cs renamed to src/Auth/JWTService.cs in auth/master, inside a directory that was renamed in HEAD, suggesting it should perhaps be moved to Sdk/Auth/src/Auth/JWTService.cs.
CONFLICT (file location): src/Auth.Net47/Mesa/Identity/Identity.cs renamed to src/Auth/Mesa/Identity/Identity.cs in auth/master, inside a directory that was renamed in HEAD, suggesting it should perhaps be moved to Sdk/Auth/src/Auth/Mesa/Identity/Identity.cs.
CONFLICT (file location): src/Auth.Net47/RealmUserService.cs renamed to src/Auth/RealmUserService.cs in auth/master, inside a directory that was renamed in HEAD, suggesting it should perhaps be moved to Sdk/Auth/src/Auth/RealmUserService.cs.
CONFLICT (file location): src/Auth.Net47/SsoUserService.cs renamed to src/Auth/SsoUserService.cs in auth/master, inside a directory that was renamed in HEAD, suggesting it should perhaps be moved to Sdk/Auth/src/Auth/SsoUserService.cs.
CONFLICT (file location): src/Auth.Net47/TokenService.cs renamed to src/Auth/TokenService.cs in auth/master, inside a directory that was renamed in HEAD, suggesting it should perhaps be moved to Sdk/Auth/src/Auth/TokenService.cs.
CONFLICT (rename/rename): src/Auth.Net47/JWTService.cs renamed to Sdk/Auth/src/Auth.Net47/JWTService.cs in HEAD and to Sdk/Auth/src/Auth/JWTService.cs in auth/master.
CONFLICT (rename/rename): src/Auth.Net47/Mesa/Identity/Identity.cs renamed to Sdk/Auth/src/Auth.Net47/Mesa/Identity/Identity.cs in HEAD and to Sdk/Auth/src/Auth/Mesa/Identity/Identity.cs in auth/master.
CONFLICT (rename/rename): src/Auth.Net47/RealmUserService.cs renamed to Sdk/Auth/src/Auth.Net47/RealmUserService.cs in HEAD and to Sdk/Auth/src/Auth/RealmUserService.cs in auth/master.
CONFLICT (rename/rename): src/Auth.Net47/SsoUserService.cs renamed to Sdk/Auth/src/Auth.Net47/SsoUserService.cs in HEAD and to Sdk/Auth/src/Auth/SsoUserService.cs in auth/master.
CONFLICT (rename/rename): src/Auth.Net47/TokenService.cs renamed to Sdk/Auth/src/Auth.Net47/TokenService.cs in HEAD and to Sdk/Auth/src/Auth/TokenService.cs in auth/master.
Automatic merge failed; fix conflicts and then commit the result.
hraban commented 1 month ago

I'd love to reproduce this somehow because I've heard others report something similar but I can never manage to actually reproduce it.

Here's a test script you can run that merges SBCL and Git from 1 year ago, then merges in 1 entire year's worth of changes from both repos into the monorepo using that same subtree merge: https://github.com/hraban/tomono/blob/test-merge/test-merge.sh

That works fine locally, and I can't think of any other way to try and break it.

Do you have a reproducible case somewhere I can try?

C-Howard commented 3 weeks ago

I can confirm that test script works as is and works even when adding support for using the third parameter. I wanted to make sure you could put both git and sbcl into the lib folder to see if that was related. i can not reproduce the error so far so I am going to try to create a set of test repos that can replicate the issue.

#!/usr/bin/env bash

set -euo pipefail

set -x

# The tomono script is tangled right next to the test script
export PATH="$PWD:$PATH"

d="$(mktemp -d)"
cd "$d"

# Two reasonably complex repos with lots of changes but not too many. And both
# with a ‘master’ branch.
>remotes.txt cat <<EOF
https://github.com/git/git.git git lib/git
https://git.code.sf.net/p/sbcl/sbcl sbcl lib/sbcl
EOF

cat remotes.txt | while read url origin dir ; do
    # Manual folder is not set
    if [[ $dir == "" ]]; then
        dir="$origin"
    fi

    git clone "$url" "$dir"
    (
        cd "$dir"
        git log --before 1.year.ago -1 --format=%h | xargs git reset --hard
    )

    # Build set of entries for local repositories
    if [[ $dir == "" ]]; then
        echo "$PWD/$dir $dir" >> locals.txt
    else
        echo "$PWD/$dir $origin $dir" >> locals.txt
    fi
done

cat locals.txt
<locals.txt tomono

# Now we have a repo core with SBCL and Git, both where they were 1y ago. Update
# those and merge in the changes.

cat remotes.txt | while read _ origin dir ; do
    (
        # Manual folder is not set
        if [[ $dir == "" ]]; then
            dir="$origin"
        fi

        cd "$dir"
        git reset --hard HEAD@{1}
    )
done

cd core
git fetch --all
git checkout master

cat ../remotes.txt | while read _ origin dir; do
    # Manual folder is not set
    if [[ $dir == "" ]]; then
        dir="$origin"
    fi

    git merge --no-edit -X subtree="$dir/" "$origin/master"
done

echo "Successfully merged in 1 year worth of changes from both remotes into $PWD"
C-Howard commented 3 weeks ago

I have run through several tests with a few different results so far:

  1. Default Repo Name - Works
  2. Custom Repo Name - Works
  3. Leaving repos in root - Works
  4. Moving repos to lib/ - Works
  5. Run script again / Make sure to git fetch --all / Merge or Rebase branches

So my overall observations are that when you run the script the resulting branch works correctly, if you fetch all remotes after the merge you should be able to cleanly merge for each origin, but you will need to manually merge / rebase all branches that come from the merge. This is not an issue with the test scripts as it works just against the master branches and those are not having issues. The fetch is required for any future subtree merges which I can verify by running the test script using my list of repositories.

Because the branches are only merged on the first run of the tool and you subtree merge anything new it explains why nothing else appears to conflict. It would only be an issue if you planned to try to merge branches later on. It may may sense to not pull over any branches at all in some cases, merge it into the upstream, and then subtree merge the changes into the monorepo.

So for copied branches the safest step to take is to merge the default branch (ex: master / main) into the existing branch (ex: project-cleanup) to make it mergable, however a rebase will drop more commits so it will lower the diff. Your pr after rebasing the branch will show the base commit for the branch + branch commits.

I did also notice by accident a small detail about how newlines are being handled. If you for example generate your repo list the way the test script does it creates a newline:

>remotes.txt cat <<EOF
https://github.com/git/git.git git
https://git.code.sf.net/p/sbcl/sbcl sbcl
EOF

This file reads all entries:

https://github.com/git/git.git git
https://git.code.sf.net/p/sbcl/sbcl sbcl

This file reads just git:

https://github.com/git/git.git git
https://git.code.sf.net/p/sbcl/sbcl sbcl

If you create the .txt by hand without a newline it will not parse the last line of the file when parsing the file with cat. So it may make sense to note the file must end a newline or the last repo will not be included. This appears to be a Unix standard around read that applies to Mac as well. As tomono doesn't actually parse the file this is just something to be aware of, it may not be easily fixable.

hraban commented 3 weeks ago

Interesting points, thank you for getting this deep into the weeds.

Yes I will say: once you start making independent changes in the monorepo versus the source repos, later merging can get tricky. This is definitely something you want to avoid. You can keep both in sync easily, but if you start making independent changes then at some point you'll get into regular old git territory and will have to deal with merge conflicts, like any other situation where you have to diverging sources.

But is that what you were doing when you encountered those conflicts? What were you encountering actual problems with? I thought you were just pulling in new changes into an otherwise unchanged monorepo--had you been making changes to the monorepo before fetching those updated sources?

Regarding the newlines: yes that's just Unix convention I'm afraid.

$ printf 'foo\nbar' | wc -l
1
$ printf 'foo\nbar' | while read a ; do echo $a ; done
foo

This is how tools do it across the board. Some are more gracious but that's best effort. There's no expectation of handling lines without terminating newline, in general. Newlines aren't separators but terminators.

C-Howard commented 3 weeks ago

Roughly the TLDR is that there are no actual errors, the script works really well but you need to make sure git remotes are updated and branches get updated to have things work as expected.

I think the confusion can come down to several things depending on how you merge things. The test script you provided actually has slightly different behavior than running the tool directly because it clones the repositories, resets to a year old commit, merges, then subtree merges the new changes. Technically that approach ignores other branches you would see this issues with as it only uses branches locally checked out. In theory if you pulled down a branch in each repository it should show the branch issue but not the subtree issue.

I am using the generated repository from tomono as is without making any changes in the repository itself. To successfully merge multiple repositories the process is the following:

  1. Create file of repositories
  2. Run tomono script
  3. Git fetch all remotes to update local data
  4. Git push / Git subtree merge
  5. Rebase / Merge branches to fix merge conflicts

What I have seen is if you miss step 3 the subtree merge finds no changes. I have a version of the test script that reverts two commits and then subtree merges the changes back in and it always says things are up to date without the fetch. I can't say exactly what state my repository was in when I saw the error but it was missing data needed to merge successfully so it showed the rename conflicts.

So if you run your test script with https://github.com/hraban/tomono/blob/test-merge/test-merge.sh#L42 commented out you will run into a situation where while it didn't error, it also didn't actually merge any changes. This is more of a gotcha than the scripts fault, it's just part of the process you need to know about. You will be missing a years worth of changes just by missing that single line.

And the 5th step is required because Github will always show merge conflicts in branches that Git can correct locally. You run the script, make sure to fetch and update all your repository state it should be possible to merge, and then fix merge conflicts in branches. I don't know if it makes sense to try to rebase as part of the tomono script but that would produce cleaner results when it is successful, though that means you need to know the default branch that will be used. It might be better to include as "cleanup instructions". Ill get a pr written shortly to get a section added to the README about that, and Im happy to tweak it to match your style.

hraban commented 2 weeks ago

Curious to see your take on it, looking forward to the PR.

C-Howard commented 1 week ago

Sorry for the delay, I have pushed up my first draft for what information to include for the cleanup section. I believe I got it roughly in the right spot in the documentation but Im open to any suggestions or questions.