newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.55k stars 708 forks source link

It seems not working if a file is pushed as git tracked but then turned to git lfs tracked #474

Closed fengyuentau closed 1 year ago

fengyuentau commented 1 year ago

Hello there! Great work!

I have a repo with LFS enabled, but somebody pushed some files (hereinafter referered to as target files) via a pull request and these files are not lfs-tracked because the contributor does not install git lfs. This PR has been merged some days ago and another 30+ commits are pushed after that. Note that these commits do not change the target files. Then I noticed this and I submitted a pull request to turn them to lfs-tracked but these target files are still in the git history.

Now I try to remove target files from git history completely. I tried this tool but after executing the command, I found target files are still in the history. So I wonder whether this tool does not consider my case.

newren commented 1 year ago

What were the names of the files? What command did you actually run? What do you mean that these files are still in the history -- how did you determine that? (Did you run a git log command of some sort locally? Did you do a git cat-file <hash>? Did you look around under .git/objects or .git/lfs/objects? Did you check the output of du -hs? Did you look at the size of the remote repository as reported by some hosting service?)

fengyuentau commented 1 year ago
git clone https://github.com/opencv/opencv_zoo
# these two files should be git lfs tracked but they are pushed as git tracked by users who did not install lfs
git filter-repo --paths models/text_recognition_crnn/text_recognition_CRNN_CH_2023feb_fp16.onnx --paths models/text_recognition_crnn/text_recognition_CRNN_EN_2023feb_fp16.onnx --invert-paths
# then I tried to check files from git history and they are still here
git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

# example output:
# 8099a32976cd   16MiB models/text_recognition_crnn/text_recognition_CRNN_EN_2023feb_fp16.onnx
# 8c4e1ab09823   31MiB models/text_recognition_crnn/text_recognition_CRNN_CH_2023feb_fp16.onnx

Also checked with du -s, and the size does not change.

newren commented 1 year ago

git filter-repo --paths models/text_recognition_crnn/text_recognition_CRNN_CH_2023feb_fp16.onnx --paths models/text_recognition_crnn/text_recognition_CRNN_EN_2023feb_fp16.onnx --invert-paths

So, python's argparse has this "helpful" feature, where it allows you to abbreviate any flag so long as the leading prefix is still unique. This means that your command is the same as

git filter-repo --paths-from-file models/text_recognition_crnn/text_recognition_CRNN_CH_2023feb_fp16.onnx --paths-from-file models/text_recognition_crnn/text_recognition_CRNN_EN_2023feb_fp16.onnx --invert-paths

This is not an instruction to delete the two files you mention, but to open them up and treat each line within those files as the name of a file that should be deleted. Odds are that no line within those files happens to name any file within your repository. But if they did, those files would be deleted.

What you intended was

git filter-repo --path models/text_recognition_crnn/text_recognition_CRNN_CH_2023feb_fp16.onnx --path models/text_recognition_crnn/text_recognition_CRNN_EN_2023feb_fp16.onnx --invert-paths

Could you try that?

fengyuentau commented 1 year ago

It works with --path instead of --paths. Alright, thanks.