newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.52k stars 708 forks source link

--commit-callback rewriting every commit even when using --refs #434

Closed u9wy closed 1 year ago

u9wy commented 1 year ago

@newren I have a repository which seems to be reacting strangely to this command. I'm attempting to rewrite the author on a specific branch with the following command.

git filter-repo --commit-callback ' if commit.author_email == b"incorrect@email": commit.author_email = b"correct@email" commit.author_name = b"Correct Name" commit.committer_email = b"correct@email" commit.committer_name = b"Correct Name" ' --refs refs/heads/test-branch

The author is rewritten succesfully but when i check the commit hashes every single commit in the branch and the parent branch have new commit hashes. This causes almost the entire history of the repository to fork.

I'm not sure why this is happening its the first time i've encountered this issue. I use the same command on other repositories without any issues.

Does anyone know what is happening?

andry81 commented 1 year ago

Is this python code? It should be correct one liner. It's pain in arse to fold multiline block in Python into inlined code. Test on python console, rename variables and execute.

u9wy commented 1 year ago

Is this python code? It should be correct one liner. It's pain in arse to fold multiline block in Python into inlined code. Test on python console, rename variables and execute.

I've tried but the same thing happens

me-and commented 1 year ago

This sounds like one of the old commits in your repository is being changed by this command. When you change an old commit, it inherently changes all the newer commits, because each commit references its parents.

u9wy commented 1 year ago

This sounds like one of the old commits in your repository is being changed by this command. When you change an old commit, it inherently changes all the newer commits, because each commit references its parents.

That sounds like it makes sense but when I use --refs and focus the changes on a specific branch it still changes every single commit in history.

For instance a scenario where I :

  1. create a new branch from develop
  2. make a commit
  3. then run the command with --refs focused on this branch and a condition which should mean it only edits the commit that was made

It still manages to go through the whole history and change the hashes for every single commit. So I'm not completely sure whats happening.

me-and commented 1 year ago

--refs doesn't prevent filter-repo from changing commits that are included in other branches. Consider the following simple repo history:

A    develop
| B  feature
|/
C
|
D

You have a develop branch and a feature branch. The feature branch was created from the develop branch when develop pointed to commit C, and both branches have had an additional commit since then.

If you run git filter-repo --refs feature, and your command would change both B and C, you'll end up with a repository that looks like this:

A     develop
| B'  feature
| |
C C'
|/
D

The fact that commit C was in the develop branch doesn't prevent filter-repo from changing it. --refs just means develop itself doesn't get changed, and has the history it always had, so only feature points to the changed version of commit C.

u9wy commented 1 year ago

--refs doesn't prevent filter-repo from changing commits that are included in other branches. Consider the following simple repo history:

A    develop
| B  feature
|/
C
|
D

You have a develop branch and a feature branch. The feature branch was created from the develop branch when develop pointed to commit C, and both branches have had an additional commit since then.

If you run git filter-repo --refs feature, and your command would change both B and C, you'll end up with a repository that looks like this:

A     develop
| B'  feature
| |
C C'
|/
D

The fact that commit C was in the develop branch doesn't prevent filter-repo from changing it. --refs just means develop itself doesn't get changed, and has the history it always had, so only feature points to the changed version of commit C.

Yes thats true my bad.

But lets say for instance I add a commit with the author incorrect@email.com and this commit is the only commit I've added to the branch that hasnt been inheritied and is the only commit with this author.

Then I run this command:

git filter-repo --commit-callback ' if commit.author_email == b"incorrect@email": commit.author_email = b"correct@email" commit.author_name = b"Correct Name" commit.committer_email = b"correct@email" commit.committer_name = b"Correct Name" ' --refs refs/heads/test-branch

My expectation is that it should only revise the commit which I added and ignore the rest of the commits in the branch.

Or have I misunderstood something? Also up to now this is exactly how it has been working, its only on one specific repository it seems to be touching all the commits.

me-and commented 1 year ago

Can you provide the actual command you're running? I've just tried to reproduce this, but a simple copy-paste fails:

$ git filter-repo --commit-callback ' if commit.author_email == b"incorrect@email": commit.author_email = b"correct@email"  commit.author_name = b"Correct Name" commit.committer_email = b"correct@email"  commit.committer_name = b"Correct Name" ' --refs refs/heads/test-branch
Traceback (most recent call last):
  File "/usr/bin/git-filter-repo", line 4005, in <module>
    main()
  File "/usr/bin/git-filter-repo", line 4001, in main
    filter = RepoFilter(args)
  File "/usr/bin/git-filter-repo", line 2760, in __init__
    self._handle_arg_callbacks()
  File "/usr/bin/git-filter-repo", line 2865, in _handle_arg_callbacks
    handle('commit')
  File "/usr/bin/git-filter-repo", line 2858, in handle
    setattr(self, callback_field, make_callback(type, code_string))
  File "/usr/bin/git-filter-repo", line 2840, in make_callback
    exec('def callback({}, _do_not_use_this_var = None):\n'.format(argname)+
  File "<string>", line 2
    if commit.author_email == b"incorrect@email": commit.author_email = b"correct@email"  commit.author_name = b"Correct Name" commit.committer_email = b"correct@email"  commit.committer_name = b"Correct Name"
                                                                                          ^
SyntaxError: invalid syntax

My best guess at this point is that the Python whitespace isn't doing what you expect it to, meaning you're making more changes than you expect on this repository, and on the other repositories you're not seeing changes because the committer name and email are being set to the same values they had already.

u9wy commented 1 year ago

Here is the command without the quotations:

git filter-repo --commit-callback ' if commit.author_email == b”incorrect@email”: commit.author_email = b”correct@email” commit.author_name = b”Correct Name” commit.committer_email = b”correct@email” commit.committer_name = b”Correct Name” ' --refs refs/heads/feature/test-branch

newren commented 1 year ago

But lets say for instance I add a commit with the author incorrect@email.com and this commit is the only commit I've added to the branch that hasnt been inheritied and is the only commit with this author.

Then I run this command:

git filter-repo --commit-callback ' if commit.author_email == b"incorrect@email": commit.author_email = b"correct@email" commit.author_name = b"Correct Name" commit.committer_email = b"correct@email" commit.committer_name = b"Correct Name" ' --refs refs/heads/test-branch

My expectation is that it should only revise the commit which I added and ignore the rest of the commits in the branch.

Your expectation is incorrect. Git does not store "starting points" for branches; it only stores the current tip. So, any given branch includes the commit at the tip, plus all of its parents, plus all of their parents, plus all of their parents, etc. recursively all the way back to the root commit. Typically, any two branches share most of their commits. So, when you say --refs refs/heads/test-branch you are telling filter-repo to filter hundreds or thousands of commits (maybe even millions if you have a big repo), and if you only wanted it to filter a few of the recent commits, then you need to instruct it which commits are relevant. For example, --refs origin/master..test-branch to only rewrite the commits in test-branch that do not also exist in origin/master. (Another example can be found in the manual; see https://www.mankier.com/1/git-filter-repo#Examples-Partial_history_rewrites).

Or have I misunderstood something? Also up to now this is exactly how it has been working, its only on one specific repository it seems to be touching all the commits.

Sounds like you got lucky and the other repository did not have old commits recorded by the bad author. If the filtering job ends up making no changes, then those commits will keep the same IDs. But, as soon as any commit is modified in any way, all commits that depend upon that changed commit will get new commit IDs.

You really need to decide -- if there is a really old commit with the bad email, do you want it rewritten? If so, then you'll need to accept new commit IDs for all intervening commits as well. If you don't, then you need to pass an appropriate revision range to --refs.

u9wy commented 1 year ago

Thanks that helps explain things, editing the --refs option to --refs develop..rename-test fixed the issue. I was stuck on this for a few days.

andry81 commented 1 year ago

git filter-repo --commit-callback ' if commit.author_email == b”incorrect@email”: commit.author_email = b”correct@email” commit.author_name = b”Correct Name” commit.committer_email = b”correct@email” commit.committer_name = b”Correct Name” ' --refs refs/heads/feature/test-branch

This python code still does not look valid. You just try to assign only commit.author_email = b”correct@email” conditionally. The rest just assigned unconditionally. No wonder it rewrites everything.

u9wy commented 1 year ago

Reference in

It seems the indents are being removed after i leave the comment

newren commented 1 year ago

This python code still does not look valid. You just try to assign only commit.author_email = b”correct@email” conditionally. The rest just assigned unconditionally. No wonder it rewrites everything.

Your hypothesis doesn't match the stated results where @u9wy said that only one commit was modified in many repositories with the same command. You may also notice that the lines have ” characters rather than actual quote marks, which would also be invalid python. So, a better conclusion to come to here is that @u9wy is unaware of how to prevent the system from messing up his code snippets when copying and pasting.

@u9wy: If you surround a block with triple-backticks both before and after the code block (on separate lines of their own), then GitHub markdown will avoid changing text inside, and highlight it as a separate block. My block below has a single line with ``` before it and another identical line at the end:

$ git filter-repo --commit-callback '
    if commit.author_email == b"incorrect@email":
        commit.author_email = b"correct@email"
        commit.author_name = b"Correct Name"
        commit.committer_email = b"correct@email"
        commit.committer_name = b"Correct Name"
    ' --refs refs/heads/feature/test-branch