Closed klorinczi closed 1 year ago
Nope, not a bug. You fed bad input into filter-repo, based on a common but incorrect assumption about how git log works.
Look at your own output:
$ git log --format="reference" --name-status --diff-filter=A '*\\*'
"systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/multi-user.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
"systemd/system/snap-git\\x2dfilter\\x2drepo-7.mount"
Let's look at the first line as an example. If you were to store that in a file, which you pass to --paths-from-file
, then git-filter-repo is going to be looking for a file named "systemd/system/default.target.wants/snap-git\\x2dfilter\\x2drepo-7.mount"
to remove. You have no such file in your repository. Instead you have one named systemd/system/default.target.wants/snap-git\x2dfilter\x2drepo-7.mount
. (Note that I have removed both "
characters and two of the \
characters.)
The problem here is that you assumed git log
would list filenames as-is, which it won't do whenever there are special characters. You can often get around this by setting core.quotepath=false (this particularly helps when you have non-ascii characters), but even that is ignored when you have backslashes.
Here's something that might work better for you for generating the list of filenames to exclude:
git log -z --all --name-only -m --pretty= '*\\*' | tr '\0' '\n' | sort -u >/opt/git_repo_files_w_escape.txt
but it assumes you do not have filenames with newline characters. (If you do have files with newline characters, though, then --paths-from-file
won't work for you.)
Does that help?
@newren Thank you very mush for pointing me to the right solution!
Your solution works perfectly, it removed all files having backlash in filename.
You are right, it is not a bug, just the git log result was not in the right format for input into git filter-repo
.
I also opened a bountied question for this problem on Stackoverflow: https://stackoverflow.com/questions/75150145/remove-all-files-from-git-repo-history-with-path-having-escape-in-filename-wit If you are on Stackoverflow and you post the Solution reproducing steps, I would be happy to give you the bounty.
Solution reproducing steps:
# Clone repository, to be executed on a safe repo:
git clone --no-local /source/repo/path/ /target/path/to/repo/clone/
# Cloning into '/target/path/to/repo/clone'...
# remote: Enumerating objects: 9364, done.
# remote: Counting objects: 100% (9364/9364), done.
# remote: Compressing objects: 100% (3706/3706), done.
# remote: Total 9364 (delta 4088), reused 9346 (delta 4082), pack-reused 0
# Receiving objects: 100% (9364/9364), 7.44 MiB | 22.29 MiB/s, done.
# Resolving deltas: 100% (4088/4088), done.
cd /target/path/to/repo/clone/
# List the files with backslash from repo history into a list file:
git log -z --all --name-only -m --pretty= '*\\*' | tr '\0' '\n' | sort -u >../git_repo_files_w_escape.txt
# check the output file content
nano ../git_repo_files_w_escape.txt
# Remove the files with backslash from repo history:
git filter-repo --invert-paths --paths-from-file ../git_repo_files_w_escape.txt
# New history written in 0.60 seconds; now repacking/cleaning...
# Repacking your repo and cleaning out old unneeded objects
# HEAD is now at 91d7141
# Enumerating objects: 9362, done.
# Counting objects: 100% (9362/9362), done.
# Delta compression using up to 8 threads
# Compressing objects: 100% (3739/3739), done.
# Writing objects: 100% (9362/9362), done.
# Total 9362 (delta 4087), reused 9305 (delta 4047), pack-reused 0
# Completely finished after 1.22 seconds.
# List files with backslash to check result:
git log -z --all --name-only -m --pretty= '*\\*' | tr '\0' '\n' | sort -u
# empty result, so history rewrite was successful!
I'm grateful for the solution, thanks again!
I had created a stack overflow account, but it wouldn't let me comment on various answers saying I didn't have enough reputation, even when commenting on posts touching areas I was an or even the expert on. Frustrated, I just never bothered answering any questions again. But, you inspired me to to try to dig out my old account info and post my answer.
Oh, also, it might even be easier to avoid generating the filenames entirely since you can just programatically check. Something like:
git filter-repo --filename-callback 'return None if b'\\' in filename else filename'
Excellent!
Thank you for the solution without exporting filenames, just programmatically replace the characters!
You could also add it your answer on Stackoverflow.
I would welcome a question upvote, because they downvoted it 🙂
Hi,
I try to remove all files from Git repo history with path having escape \ in filename with git filter-repo.
I have special filenames with escape \ characters stored in Git repository on Debian 10 Linux.
Problem: it is not possible to git checkout files on Windows, which have incompatible characters in the filename.
Problem reproducing steps:
Could be possible, that it is a bug?