Remove a commit based on its message content or filename with git filter-repo --commit-callback

tomdev92 commented 10 months ago

I am attempting to remove a commit from my repository history based on its message or filename by using commit.skip() . However, it seems that once the conditional statement evaluates to true, all subsequent commits in the history are skipped, irrespective of the condition.

The code I have been using to skip a commit based on its filename is the following:

git filter-repo --commit-callback 
 for change in commit.file_changes:
  if b"index" in change.filename:
   commit.skip()

whereas for skip a commit based on its message content is the following:

git filter-repo --commit-callback 
 if b"index" in commit.message:
  commit.skip()

The logic of the conditionals is correct, as when I use

git filter-repo --commit-callback 
 if b"index" in commit.message:
  commit.message = b"new commit message"

it works perfectly fine, affecting only the commits which respect the conditional statement.

I have not found any indication on how the skip() works in the git filter-repo documentation.

newren commented 3 months ago

Using commit.skip() is almost always going to do the wrong thing. blob.skip() can be useful because a skipped blob just always resulted in the file being left out of each commit that depends on it, and tag.skip() can be useful because nothing else depends on a tag (unless you have a tag of a tag, in which case tag.skip() probably royally screws things up). For commit.skip(), it's particularly problematic, because it's unclear how to handle subsequent commits that depend on the skipped commit -- both from a topology standpoint, and from a what file changes does it contain standpoint.

Since fast-export in terms of 'keep most files the parent commit had, but for these three files, here's the exact version of this file you should move to'. So, if you skipped a commit but it had a modification to some file, at best a subsequent commit that modified that file would then introduce the combination of it's original changes to the file plus what was in the skipped commit. If you skipped a commit that added a file, then a subsequent commit that had originally changed that file would instead look like it added it with the combination of changes from the original add and whatever changes the commit had originally made.

There are similar weirdnesses with topology and making sure to get reparenting right.

But the short answer is that commit.skip() is basically 0% likely to be the solution to whatever problem you have. And I'm not sure the existing uses in git-filter-repo around that code are correct. There's no way I'm going to document it and risk more people using it.

Instead, we need to find out what your actual usecase is and solve it, rather than try to mold this wart in a way that might kinda sort fit your problem.

So, what are you trying to achieve? What do you mean by "skipping a commit"? Should the changes made by the commit disappear? Should they be smashed into subsequent commits? What kinds of changes are i , and then after all N runs, run git filter-repo --proceed Use git log to find the first commit after the ${MERGE_COMMIT} that gave you all the duplicates. Let's call that ${FIRST_COMMIT}. Also, final the final commit of the n the commits being skipped? (only new files that no other commit ever touches? no file changes whatsoever? all kinds of changes?) Are you only every skipping commits that are empty (i.e. that make no changes)? Do you ever try to skip merge commits (which could be really problematic...)? Do you ever skip enough commits to make merge commits become degenerate (which might also get really thorny...)?

newren commented 1 month ago

No response, so I'll go ahead and close. I probably responded too late to be of help. If you do want to look at this more, though, feel free to reopen and comment.

newren / git-filter-repo

Remove a commit based on its message content or filename with git filter-repo --commit-callback #534