Closed andry81 closed 1 year ago
Just in case, I've found another way to replace file without rewriting entire commit source tree.
This will update (add or replace) file without affecting other files in a commit source tree.
Note: But this method affects the file history if the file has changes in next child commits and leaves the old file in them. To fix that all the children commits must be rewritten with a new version of a file in each such commit.
Issue: If try to rewrite the sourcetree of
commit1
by the sourcetree ofcommit3
, then does change the sourcetree ofcommit2
instead ofcommit1
.
I don't understand this sentence. Could you rewrite it? Particularly, "then does the change the sourcetree" seems like it's missing some words or something.
git for-each-ref --format="delete %(refname)" refs/replace | git update-ref --stdin
Why not just pass --delete-no-add
to filter-repo instead of adding step 6?
As a result the
commit1
stays empty, when thecommit2
does change.
I don't understand. In your testcase, commit1
is not empty:
$ git log --oneline --graph --all --raw
* f98c130 (HEAD, origin/branch-to-rewrite, origin/HEAD, branch-to-rewrite) 2
* fc3f681 1
:000000 100644 0000000 b72227d A test.txt
* c71e9ed (origin/branch-to-read, branch-to-read) 3
:000000 100644 0000000 a69091c A test.txt
So, commit "1" clearly added a new file "test.txt", different than what commit "3" added. commit "2" is a so-called "empty" commit in that it has no change from its parent commit "1".
In the same time the
commit2
could have has children commits, which brings rewrite of unrelated commits and breaks the repository.
I don't understand this claim either. Before the "git replace" command, I added a child of commit "2" to your testcase, so that it looks like this:
$ git log --oneline --graph --all --raw
* 12635df (main) 4
| :100644 100644 b72227d 8ff8233 M test.txt
* f98c130 (HEAD, origin/branch-to-rewrite, origin/HEAD, branch-to-rewrite) 2
* fc3f681 1
:000000 100644 0000000 b72227d A test.txt
* c71e9ed (origin/branch-to-read, branch-to-read) 3
:000000 100644 0000000 a69091c A test.txt
Note that both commits "1" and "3" were "root" commits (neither had any parents) and both added the test.txt file but with different contents. commit "2" made no change but has commit "1" as a parent. commit "4" changed test.txt. Now, after running your steps 4 & 5, I see:
$ git log --oneline --graph --all --raw
* def5ca5 (main) 4
| :100644 100644 a69091c 8ff8233 M test.txt
* 55dbe13 (HEAD, branch-to-rewrite) 2
* 0530694 1
:000000 100644 0000000 a69091c A test.txt
* c71e9ed (branch-to-read) 3
:000000 100644 0000000 a69091c A test.txt
Meaning now that commit "1" introduces the same version of test.txt that commit"3" does, commit "2" still makes no change, and commit "4" has the version of test.txt that it did before the rewrite.
What did you expect? Were you perhaps expecting the diffs between the old commits 1 and 4 to remain the same after modifying commit 1 and so that commit "4" also had a different version of test.txt? If so, you're using the wrong tool. See https://github.com/newren/git-filter-repo/issues/62#issuecomment-597725502 and how this kind of thing should be done with a rebase-like tool. I linked to that explanation from https://github.com/newren/git-filter-repo/blob/main/Documentation/converting-from-filter-branch.md#removing-commits-by-a-certain-author; I wonder if there are more places I should link to it from.
I don't understand this sentence. Could you rewrite it? Particularly, "then does the change the sourcetree" seems like it's missing some words or something.
"then it does change the sourcetree"
Why not just pass --delete-no-add to filter-repo instead of adding step 6?
Just an example.
I don't understand. In your testcase, commit1 is not empty:
It is empty because has no changes.
Showing 0 changed files with 0 additions and 0 deletions.
So, commit "1" clearly added a new file "test.txt", different than what commit "3" added. commit "2" is a so-called "empty" commit in that it has no change from its parent commit "1".
Sorry, bad wording. I meant the diff, not the sourcetree. It does replace the sourcetree in the commit1
, but additionally to it does replace the sourcetree in the commit2
when should not.
I don't understand this claim either. Before the "git replace" command, I added a child of commit "2" to your testcase, so that it looks like this:
This is different example. The thing is that when I try to change something in commit1
, then it tries to rewrite commit2
but I don't want it.
What did you expect?
I expect commit2
(first one) to leave intact, when only commit1
(last one) must be changed.
I don't understand this sentence. Could you rewrite it? Particularly, "then does the change the sourcetree" seems like it's missing some words or something.
"then it does change the sourcetree"
Could you write the full sentence? The snippet here suggests you intended
Issue: If try to rewrite the sourcetree of commit1 by the sourcetree of commit3, then it does change the sourcetree of commit2 instead of commit1.
But that makes no sense either unless you meant "in addition to" rather than "instead of"? I'm really not following your setup. [Edit, after reading the rest of what you wrote, I think I understand it know even if this sentence makes no sense as written.]
I don't understand. In your testcase, commit1 is not empty:
It is empty because has no changes.
andry81-tests/git-filter-repo-test-1@f98c130
Showing 0 changed files with 0 additions and 0 deletions.
The link you provided is to commit 2, not to commit 1. So the output you show is about commit 2 being empty, not commit 1.
However, the part that really matters here is the fact that commit 2 is empty means that both commit 1 and commit 2 have the same tree. I'll point out below how that matters.
So, commit "1" clearly added a new file "test.txt", different than what commit "3" added. commit "2" is a so-called "empty" commit in that it has no change from its parent commit "1".
Sorry, bad wording. I meant the diff, not the sourcetree. It does replace the sourcetree in the
commit1
, but additionally to it does replace the sourcetree in thecommit2
when should not.
Your git replace
command changed both. After steps 1-4 of your steps to reproduce, run the following:
$ git log --oneline
f98c130 (HEAD -> branch-to-rewrite, origin/branch-to-rewrite, origin/HEAD) 2
fc3f681 1
$ git show f98c130:test.txt
aaaaaa
bbbbbb
cccccc
$ git show fc3f681:test.txt
aaaaaa
bbbbbb
cccccc
Also,
$ git show-ref | grep replace
806648a5e8f56dc62ce6cefc17ec9f419ef536e4 refs/replace/6c893883e190460b08e8e01149ad79d9a87cf12e
$ git --no-replace-objects show 6c893883e190460b08e8e01149ad79d9a87cf12e:test.txt
aaaaaa
cccccc
$ git --no-replace-objects show 806648a5e8f56dc62ce6cefc17ec9f419ef536e4:test.txt
aaaaaa
bbbbbb
cccccc
Anyway, this shows this isn't a bug of filter-repo accidentally changing extra commits; you changed multiple commits before you even called filter-repo with your replace command. (This is true because you replaced 6c893883e190460b08e8e01149ad79d9a87cf12e with 806648a5e8f56dc62ce6cefc17ec9f419ef536e4, and both commit 1 and commit 2 used 6c893883e190460b08e8e01149ad79d9a87cf12e for their tree).
This is different example. The thing is that when I try to change something in
commit1
, then it tries to rewritecommit2
but I don't want it.
You've got commit1 and commit 2 backwards again. After your going up to step 3 of reproduction steps:
$ git --no-replace-objects log --oneline --raw
f98c130 (HEAD -> branch-to-rewrite, origin/branch-to-rewrite, origin/HEAD) 2
fc3f681 1
:000000 100644 0000000 b72227d A test.txt
Which shows that branch-to-rewrite points at commit 2. At this point you ran step 4, i.e.
git replace "branch-to-rewrite^{tree}" "branch-to-read^{tree}"
and since branch-to-rewrite points at commit2, I can't see how you would claim you don't want commit 2 changed. I could see how you could misunderstand and claim you only wanted commit2 changed and not commit1, but this replace command changes both since it's the same as
git replace 6c893883e190460b08e8e01149ad79d9a87cf12e 806648a5e8f56dc62ce6cefc17ec9f419ef536e4
Where 6c893883e190460b08e8e01149ad79d9a87cf12e was the tree used by both commit1 and commit2.
The presence of this replacement results in the following:
$ git log --oneline --raw
f98c130 (HEAD -> branch-to-rewrite, origin/branch-to-rewrite, origin/HEAD) 2
fc3f681 1
:000000 100644 0000000 a69091c A test.txt
i.e. both commits 1 and 2 still have the same tree as each other, you've just replaced that tree. If I modify the command slightly:
$ git --no-replace-objects log --oneline --raw
f98c130 (HEAD -> branch-to-rewrite, origin/branch-to-rewrite, origin/HEAD) 2
fc3f681 1
:000000 100644 0000000 b72227d A test.txt
you note that even after the replacement, if we tell git to ignore the replacement object, both commits are known to have the same old tree.
What did you expect?
I expect
commit2
(first one) to leave intact, when onlycommit1
(last one) must be changed.
In that case you're misusing git replace
.
Could you write the full sentence? The snippet here suggests you intended
If try to rewrite the sourcetree of commit1 by the sourcetree of commit3, then it does change the sourcetree of commit2 in addition to commit1.
But that makes no sense either unless you meant "in addition to" rather than "instead of"? I'm really not following your setup. [Edit, after reading the rest of what you wrote, I think I understand it know even if this sentence makes no sense as written.]
Yes, may be it is not clear, but commit1
still is "empty" after the replace as has no difference with the parent.
The link you provided is to commit 2, not to commit 1. So the output you show is about commit 2 being empty, not commit 1.
Yes, there is different order: 2-1 -> 1-2. I guided by the picture in the first post:
branch-to-read branch-to-rewrite
| |
+commit3 +commit1
|
+commit2
But if this is important, we can just synchronize the picture with the repo:
branch-to-read branch-to-rewrite
| |
+commit3 +commit2
|
+commit1
Now commit2
is commit1
and commit1
is commit2
.
Anyway, this shows this isn't a bug of filter-repo accidentally changing extra commits; you changed multiple commits before you even called filter-repo with your replace command. (This is true because you replaced 6c893883e190460b08e8e01149ad79d9a87cf12e with 806648a5e8f56dc62ce6cefc17ec9f419ef536e4, and both commit 1 and commit 2 used 6c893883e190460b08e8e01149ad79d9a87cf12e for their tree).
I didn't want to change anything in the parent. This is unexpected behaviour.
Is hard to read your text. You are using hash numbers of commits what I can not see on mine side (6c893883e190460b08e8e01149ad79d9a87cf12e 806648a5e8f56dc62ce6cefc17ec9f419ef536e4) and the output from git log
that is hard to read.
Anyway, this shows this isn't a bug of filter-repo accidentally changing extra commits; you changed multiple commits before you even called filter-repo with your replace command. (This is true because you replaced 6c893883e190460b08e8e01149ad79d9a87cf12e with 806648a5e8f56dc62ce6cefc17ec9f419ef536e4, and both commit 1 and commit 2 used 6c893883e190460b08e8e01149ad79d9a87cf12e for their tree).
I didn't want to change anything in the parent. This is unexpected behaviour.
If you didn't want to change anything in the parent, then you should use a git replace
command that doesn't change the parent.
Is hard to read your text. You are using hash numbers of commits what I can not see on mine side (6c893883e190460b08e8e01149ad79d9a87cf12e 806648a5e8f56dc62ce6cefc17ec9f419ef536e4) and the output from
git log
that is hard to read.
You can see them on your side if you run the same commands I ran; I ran them on the repository you said to clone. However, let me try again from the top to make it very clear; I'll show each and every command you can run in order...
First, run the first three steps you provided:
git clone https://github.com/andry81-tests/git-filter-repo-test-1 git-filter-repo-test-1
cd git-filter-repo-test-1
git pull origin *:*
Now, run the following commands to look at commits 1, 2, and 3 (these commands just inspect; they don't change anything):
git cat-file -p branch-to-rewrite
git cat-file -p branch-to-rewrite~1
git cat-file -p branch-to-read
If you do that, you should see
$ git cat-file -p branch-to-rewrite
tree 6c893883e190460b08e8e01149ad79d9a87cf12e
parent fc3f6812fd99ca543b4b311eb342a7c2936e2286
author andry81 <andry@inbox.ru> 1684025347 +0300
committer andry81 <andry@inbox.ru> 1684025347 +0300
2
$ git cat-file -p branch-to-rewrite~1
tree 6c893883e190460b08e8e01149ad79d9a87cf12e
author andry81 <andry@inbox.ru> 1684025329 +0300
committer andry81 <andry@inbox.ru> 1684025329 +0300
1
$ git cat-file -p branch-to-read
tree 806648a5e8f56dc62ce6cefc17ec9f419ef536e4
author andry81 <andry@inbox.ru> 1684025446 +0300
committer andry81 <andry@inbox.ru> 1684025446 +0300
3
In particular, you should note that both commits 1 & 2 have the same tree, 6c893883e190460b08e8e01149ad79d9a87cf12e, while commit 3 has the tree 806648a5e8f56dc62ce6cefc17ec9f419ef536e4. The git cat-file -p
command can be used to not only inspect commits, but also trees and blobs. So, you can investigate further with commands like:
git cat-file -p branch-to-rewrite^{tree}
git cat-file -p branch-to-rewrite^{tree}:test.txt
git cat-file -p 806648a5e8f56dc62ce6cefc17ec9f419ef536e4
git cat-file -p 806648a5e8f56dc62ce6cefc17ec9f419ef536e4:test.txt
git cat-file -p b72227ddfd913072539023c43a102de186a58b28
Let's look in particular at the contents of the tree for commits 1 & 2:
$ git cat-file -p branch-to-rewrite^{tree}
100644 blob b72227ddfd913072539023c43a102de186a58b28 test.txt
$ git cat-file -p branch-to-rewrite~1^{tree}
100644 blob b72227ddfd913072539023c43a102de186a58b28 test.txt
Showing that the trees of both commits have the same contents. Now, let's run your step 4:
git replace "branch-to-rewrite^{tree}" "branch-to-read^{tree}"
and then repeat the last two inspection commands:
$ git cat-file -p branch-to-rewrite^{tree}
100644 blob a69091c1d22cb894902456449323add6b0373e45 test.txt
$ git cat-file -p branch-to-rewrite~1^{tree}
100644 blob a69091c1d22cb894902456449323add6b0373e45 test.txt
This shows that the git replace
command you ran changed the tree of both commits 1 and 2. That makes sense because they both shared the same tree (Git's basic design structure is a Merkle Tree after all), and you replaced that tree with something else. If you had wanted to just change one commit, then you should have created a new commit and used git replace
to replace the commit you wanted rather than trying to replace a tree. By using git replace
to change a tree or a blob, you change every place throughout your git repository that has the same tree or blob.
This shows that the git replace command you ran changed the tree of both commits 1 and 2.
So this means that the git replace
has unreliable implementation leading to rewrite everything else in a repo.
That makes sense because they both shared the same tree
This is internal details. Any 2 or more commits in a repo could have has a common blob. This does not mean they all should be rewritten by default, especially if I directly pointed a single commit I wanted to rewrite.
By using git replace to change a tree or a blob, you change every place throughout your git repository that has the same tree or blob.
No. There must exist commits range parameter to limit this. Until then, this command just unpredictable and dangerous.
This shows that the git replace command you ran changed the tree of both commits 1 and 2.
So this means that the
git replace
has unreliable implementation leading to rewrite everything else in a repo.
It's well defined and entirely deterministic, so to me "unreliable" doesn't seem to fit at all. In fact, I'd say it's quite reliable when you use it as designed.
Any 2 or more commits in a repo could have has a common blob.
Any 2 or more commits in a repo could also have a common tree. In fact, both are quite common. Git's data model is that any blob or tree or commit that has the same contents as another object of the same type is in fact the exact same object. That concept is pretty fundamental to its design; which you may run across in the places that refer to it as a Content Addressable File System or as a Merkle Tree.
This does not mean they all should be rewritten by default, especially if I directly pointed a single commit I wanted to rewrite.
But you didn't rewrite a commit; if you had, you wouldn't have had this problem. You rewrote a tree, which affects all commits that contained that shared tree.
There are a couple ways you could have rewritten the commit. One would be git replace --edit branch-to-rewrite^{commit}
and then replaced the tree 6c893883e190460b08e8e01149ad79d9a87cf12e
line with tree 806648a5e8f56dc62ce6cefc17ec9f419ef536e4
. Alternatively, if you want it all scripted without manual editing and looking up tree IDs, then you could run
logmsg="$(git log -1 --format=%B branch-to-rewrite)"
export GIT_AUTHOR_NAME="$(git log -1 --format=%an branch-to-rewrite)"
export GIT_AUTHOR_EMAIL="$(git log -1 --format=%ae branch-to-rewrite)"
export GIT_AUTHOR_DATE="$(git log -1 --format=%ad branch-to-rewrite)"
export GIT_COMMITTER_NAME="$(git log -1 --format=%cn branch-to-rewrite)"
export GIT_COMMITTER_EMAIL="$(git log -1 --format=%ce branch-to-rewrite)"
export GIT_COMMITTER_DATE="$(git log -1 --format=%cd branch-to-rewrite)"
git replace branch-to-rewrite^{commit} $(echo $logmsg | git commit-tree -p branch-to-rewrite~1 branch-to-read^{tree})
Using either of these would have ensured that you only modified one commit.
But anyway, this is outside the scope of filter-repo; your issue is all about git replace
usage. But I hope the explanation helps.
It's well defined and entirely deterministic, so to me "unreliable" doesn't seem to fit at all. In fact, I'd say it's quite reliable when you use it as designed.
A command which can damage your repository unexpectedly without a predication can not be called reliable. Sorry, but no.
Any 2 or more commits in a repo could also have a common tree. In fact, both are quite common. Git's data model is that any blob or tree or commit that has the same contents as another object of the same type is in fact the exact same object. That concept is pretty fundamental to its design; which you may run across in the places that refer to it as a Content Addressable File System or as a Merkle Tree.
These are internal details has nothing to do with the replace command. A user may know or not know what is inside because there no such details in documentation about it: https://git-scm.com/docs/git-replace
But you didn't rewrite a commit; if you had, you wouldn't have had this problem.
I did, hash is changed which means it is rewritten. But I not meant I want to rewrite other random commits. This is definitely not the case.
You rewrote a tree, which affects all commits that contained that shared tree.
And this is why it damages the repository. Because any random commit could share the same object. I think this is obvious.
There are a couple ways you could have rewritten the commit. One would be git replace --edit branch-to-rewrite^{commit} and then replaced the tree 6c893883e190460b08e8e01149ad79d9a87cf12e line with tree 806648a5e8f56dc62ce6cefc17ec9f419ef536e4. Alternatively, if you want it all scripted without manual editing and looking up tree IDs, then you could run
This is the same case, the tree 806648a5e8f56dc62ce6cefc17ec9f419ef536e4
must not exist in the tree I want to rewrite. Which means I have to create it before.
A command which can damage your repository unexpectedly without a predication
Oh, the documentation issue is a really good point. I assumed the manpage for git replace
would come with warnings, similar to git rebase
, but I just checked now since you pointed this out and see that it's not the case. I'll see if I can propose a patch upstream for this.
can not be called reliable.
I think you meant "safe" or "intuitive" there rather than "reliable"; not being reliable means it has bugs or incomplete implementation, neither of which is the case here. Calling it not safe or not intuitive seems more along the lines of your intent.
Sorry, but no.
No need to be sorry; I'm not the author of git replace
, so my feelings aren't hurt based on your opinion of it.
But you didn't rewrite a commit; if you had, you wouldn't have had this problem.
I did, hash is changed which means it is rewritten. But I not meant I want to rewrite other random commits. This is definitely not the case.
No, the command
git replace "branch-to-rewrite^{tree}" "branch-to-read^{tree}"
does NOT replace a commit; it replaces a tree. After running that command, the commit hash is unchanged. Now, when you also ran git filter-repo ...
after that step, then it modified the commit hash to make use of the new tree, but, of course, it did that with all affected commits, since you provided an instruction to replace a certain shared tree. You need to replace a commit instead of replacing a tree to avoid this kind of problem.
You rewrote a tree, which affects all commits that contained that shared tree.
And this is why it damages the repository. Because any random commit could share the same object. I think this is obvious.
Don't be so hard on yourself. As you pointed out, you didn't know about the fact that blobs & trees were shared when they are identical. Anyway, just replace a commit instead of replacing a tree and then you won't have this problem.
does NOT replace a commit; it replaces a tree. After running that command, the commit hash is unchanged. Now, when you also ran git filter-repo ... after that step, then it modified the commit hash to make use of the new tree, but, of course, it did that with all affected commits, since you provided an instruction to replace a certain shared tree. You need to replace a commit instead of replacing a tree to avoid this kind of problem.
I don't want to replace a commit, I want to replace a sourcetree. It doesn't because it replaces the tree reference. Which IS not expected.
Don't be so hard on yourself. As you pointed out, you didn't know about the fact that blobs & trees were shared when they are identical.
I am not. And I shouldn't, why I have to? I don't need to know the details, this is problem the underlying command to know.
Anyway, just replace a commit instead of replacing a tree and then you won't have this problem.
This is a different operation which does replace almost everything in the commit instead of replace the source tree, which is the intent.
I don't want to replace a commit, I want to replace a sourcetree. It doesn't because it replaces the tree reference. Which IS not expected.
You're making up terminology here. If you're going to use a term like "sourcetree", which is not in git's glossary, you should define it.
Anyway, just replace a commit instead of replacing a tree and then you won't have this problem.
This is a different operation which does replace almost everything in the commit instead of replace the source tree, which is the intent.
It fully replaces the commit, yes, which is the point. But it copies all the fields in the commit except for the tree, meaning that the tree is the only thing (other than the commit hash) that changes.
But, more directly, note that git has four fundamental objects: tags, commits, trees, and blobs. You can only use git replace
on those four exact object types; there are no other options.
trees and blobs are shared whenever they are identical. So, you can ignore reality and insistently claim that you want to modify "just a tree" (which is impossible as I've now explained at length), or you can face reality and realize that to solve the problem you described, you need to change one commit's tree. Changing one commit's tree means that you need to modify that commit to have it reference a different tree.
Anything else is, as I said above, misusing git replace
.
You're making up terminology here. If you're going to use a term like "sourcetree", which is not in git's glossary, you should define it.
I am not. The source tree just tree of sources, not the reference to a tree.
Initial repository: https://github.com/andry81-tests/git-filter-repo-test-1
commit3/test.txt
commit2/test.txt
Issue: If try to rewrite the sourcetree of
commit1
by the sourcetree ofcommit3
, then does change the sourcetree ofcommit2
instead ofcommit1
.Steps to reproduce:
As a result the
commit1
stays empty, when thecommit2
does change. In the same time thecommit2
could have has children commits, which brings rewrite of unrelated commits and breaks the repository.