newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.24k stars 698 forks source link

remove not needed LFS objects as well when deleting files #257

Open Mo-Gul opened 3 years ago

Mo-Gul commented 3 years ago

I am still a beginner at git so please forgive me if I am writing just garbage.

It would be nice if not needed LFS objects would be deleted as well when deleting files. When I understand https://www.atlassian.com/git/tutorials/git-lfs right this could be done with git lfs prune hopefully in a save way by

// store `lfs.pruneoffsetdays` setting:
// if `git config lfs.pruneoffsetdays` doesn't return anything it is not set otherwise store value
git config lfs.pruneoffsetdays 18250
git lfs prune
// restore lfs.pruneoffsetdays` setting either with
//    `git config --unset lfs.pruneoffsetdays`
// or 
//    `git config lfs.pruneoffsetdays <stored value>`
newren commented 3 years ago

Does this presume you are deleting files stored in LFS? And that you have git lfs installed? Not sure if that makes sense to run in general.

I think this kind of thing would probably be better for a contrib script. Then it could add additional things, like transforming LFS objects into normal repo objects or vice-versa. It'd require someone reading through the LFS documentation and learning its format, but it'd be an interesting addition.

Mo-Gul commented 3 years ago

Answer of your questions

Yes, I have a repo which uses LFS. The main goal is to delete all LFS objects as well from the repo, when I delete the normal/pointer file.

I hope that it will not be too hard to find out if LFS is installed. To find out, if it is/was used in the repo of question (or at least to get a hint on that) one could look for a (non-empty) .gitattributes file anywhere in the history and/or if the .git/lfs folder exists.

I don't know what a contrib script is, sorry. (As I said, I am quite a newbie on git.)

Background

Why did I come up with this request: Up to now I had one big repo which contained "all my stuff" because I was the only user of that stuff. That repo also uses LFS because there are quite some big binary files included. But now I want to share part of the stuff and so decided to split that "one big repo" into several small (project related) respos. git-filter-repo is making my day because with that it is a quite simple task to do so.

The only issue is, that although for some projects it remained only one (ASCII) file in the repo, the repo size still was several 100 MB (the original size of the "one big repo"). Having a look the folder sizes it came from the overwhelming part is in the .git/lfs/objects folder.

At https://www.atlassian.com/git/tutorials/git-lfs#finding-references it is described that one can search for the SHA-256 OID (which are the final files in the subfolders of .git/lfs/object in git history using

git log --all -p -S <SHA-256 OID>

and if nothing is returned I assume it is not needed any more and should be safe to delete. So maybe it would be a safer way to first make a list of all SHA-256 OIDs, search for all of them and only delete the ones where nothing was returned!?

But I hope that workflow from my first comment in the end leads to the same result ...