hraban / tomono

Multi- To Mono-repository merge
https://tomono.0brg.net
GNU Affero General Public License v3.0
842 stars 138 forks source link

Did we do it wrong? #16

Closed a-c-m closed 4 years ago

a-c-m commented 6 years ago

We use the script, and have the files, we also have the commits (as shown by a git log in root), but they are disconnected.

All files have a single commit against them "Merging api to master" for example. Basically they look like fresh files - but, interestingly, the commits are still there, but point to files in their old location e.g. api/index.js = single commit index.js = all historical commits for api's index.js

What went wrong? Is there a way to recover?

Thanks

a-c-m commented 6 years ago

the answer is git log --follow !

But wondering if there is a command we can run to merge the --follow logs to the parent.

hraban commented 6 years ago

Hi @a-c-m , I'm afraid you got it right. the git tree only stores diffs, git log tries to infer "changes to a file" and what "file" means, and when a file is not a file anymore or when it is, at runtime. There is no way to add hints to the commits themselves. Which, imo, is a shortcoming of git. Git is good at what it does, which is being a DAG of small, coherent diffs with built-in consistency and fingerprinting, but falls quite short at anything else, including being an actual VCS. Things like large files, file moves / renames, or refactoring and splitting files up into smaller files: git quickly falls apart. It's too bad it won the VCS war, to be honest :) but that's what we have, and what we must work with. In short: nope, git log --follow it is, for ever.

/rant

TomasVotruba commented 6 years ago

Hi @a-c-m , I got into the same situation where old file was deleted and new created.

git log --follow works in CLI, but on it Github nor Gitlab, so it's useless for open-source projects.

A friend of mine got a solution that changes paths and keeps the git history. It sounds like :rocket: engineering, but it's pretty simple. You can read about it here: https://blog.shopsys.com/how-to-merge-15-repositories-to-1-monorepo-keep-their-git-history-and-add-project-base-as-well-6e124f3a0ab3

hraban commented 6 years ago

Unfortunately, that rewrite_history_into.sh script uses filter-branch to rewrite the history of all files into a subdir, which doesn't keep your commit hashes intact. This is fundamentally inescapable, because the commit hash fingerprints the content of the patch. If you change the location of a file, you will change that hash.

It is a fine trade-off to make, of course: if you care more about following history than preserving commit hashes, then filter-branch with a file move is definitely your way to go. In fact, this is how we initially did it (see 123b158, and the subsequent 4b2ff03 which introduces the current history-preserving mechanism). However, if you need to preserve hashes in your history, this won't work.

There is, of course, a third option: break SHA1 and find a way to cause collisions between new and old history :D

TomasVotruba commented 6 years ago

Thomas Sowell: There Are No Solutions, Only Trade-offs

Agreed :) I prefer working history of files rather then removed files with no history and correct hash.

Btw, what do you need such hashes for? I only used monorepo repository, not the split one.

hraban commented 6 years ago

It's a matter of taste, but basically there's a lot of reasons to keep hashes constant. Commit messages can refer to other commits, issue trackers refer to commit hashes, continuous integration refers to hashes, build artifacts, etc, etc.

But, yeah: it was a requirement we had and which we set for ourselves. If you find yourself not caring about keeping commit hashes constant, then by all means change history :) (in fact, if you go back in the commit history of this project, to the first commit, I think it did just that. we changed it later.)