newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.47k stars 707 forks source link

using git filter-repo to migrate out of LFS while maintaining history #7

Open benblo opened 5 years ago

benblo commented 5 years ago

I have a somewhat large repo (20 GB, 20k commits, from a video game project), that was using LFS for big assets. I'd like to reuse the engine part for another project, while removing the game-specific part and not carry that history (especially the assets). I also want to ditch LFS (which proved unsatisfying for various reasons).

I figure using filter-repo to remove game-specific directories will be trivial, and very few lfs objects should be left over after that, so perhaps I could also use filter-repo to reinject those objects as non-lfs ones? Any pointers on how to go about it?

I know lfs has a "migrate export" command but from what I understand it will only inject the latest version; I'd prefer to preserve the whole history, so I could if necessary rollback to any point in time and have the actual files instead of dummy pointers.

I understand filter-repo is still somewhat fresh, as I can't find that much info about it (btw I apologize if this is not the proper place to ask for help). BTW, there are quite a few open issues in git-lfs along the lines of "OMG I want out, how do I untangle myself from this thing now??!", and there doesn't seem to be a clear consensus 😄 ! (That might also be because most people aren't ready to rewrite history, but I am.)

newren commented 5 years ago

Very interesting question. It should be possible to do this kind of transition, you'd just need a way to get the necessary data out of LFS.

I actually thought this would make for a good example under contrib/ some months back and started work on it in the 'pu' branch (though I was focusing more on conversion to lfs at first), but got frustrated reading through the git-lfs api docs; some things were documented well but there were some holes. Besides, I hadn't ever used LFS myself and lacked motivation to fully learn it, so I simply punted.

But I can certainly provide a few pointers.

If you or someone else wants to do an lfs-conversion, the contrib/filter-repo-demos/insert-beginning script shows an example of how to add extra files into a commit (in particular, appending to commit.file_changes). You'd need to modify it to change the checks based on number of parents to instead check whether that commit contained any new LFS files that the (first) parent commit did not have, and if so, add it. You may also need to modify existing control files such as .gitattributes (the contrib/filter-repo-demos/lint-history might be helpful as an example of inserting a new Blob into the stream and changing an existing change.blob_id to use it), and you may need to delete other control files such as .lfsconfig (for which contrib/filter-repo-demos/clean-ignore may be a helpful example, at least the bits showing how to strip something out of commit.file_changes).

As for how to get the data out of LFS, though, that I can't help with a whole lot. There are links to git-lfs API documentation in the 'pu' branch which might be helpful.

If you come up with something that works and are willing to share, it'd be awesome to add something to contrib -- even if it's not general, only does one side of the lfs conversion, etc.

benblo commented 5 years ago

Turns out my assumption was wrong: git lfs migrate export --everything --include="*" does rewrite the whole history, across all branches, reinjecting all the large files' consecutive versions (see here). Awesome! Thanks for the info anyway! So far I'm super impressed by filter-repo's speed, I'm pondering if it could be used to replace git-subtree (which for my use is really lacking).

newren commented 5 years ago

Cool, glad you found a solution to handle your case of migrating out of LFS. I suspect filter-repo could still make things better (e.g. does git lfs update referenced sha1sums in commit messages), and rolfb is interested in the case of migrating into LFS using filter-repo, so I'll leave this ticket open so people can see my above pointers about how writing an lfs-conversion script based on filter-repo would work.

benblo commented 5 years ago

Yeah, migrate export solved my immediate issue so I moved on, this repo is such a mess that preserving sha1 in commit messages is the least of my problems :) ! I have repos that filter-repo could help solve some of those issues though, so I may be back with more questions in a few days.

newren commented 4 years ago

Actually, I think I'll change my mind and close this one out just to keep the issue list tidy. I've got it marked with the contrib-candidate label though to help me and others find it.

ymartin59 commented 4 years ago

@newren I propose to re-open this issue according to the following use-case: I expect to migrate a Git repository with notes on commits to LFS and append ".lfsconfig" in all commits in a single execution (and rewrite commit hash reference in notes too)

klinki commented 3 years ago

@newren Hello, I created script to do LFS migration. Unfortunately there are some manual steps, but it worked well enough for me.

Here is gist: https://gist.github.com/klinki/3a314ab3e7ab680d16b5e7eb256cafbd

Currently it is just an example and it would require a lot of polishing (and automating some manual steps). But it is good enough as a starter.

lstrojny commented 3 years ago

I needed to import a quite big repository (500K commits) into lfs and git lfs import was way too slow. bfg on the other hand was very fast but has limitations on matching (e.g. it cannot match paths) so I've looked into how to do it with git-filter-repo. Here is a working version: https://gist.github.com/lstrojny/6d29aea45179668725f43650fa46c4e7 It takes ~5 minutes for 500K commits, while git lfs import would have taken hours.

Please note that my Python sucks, it’s probably way too complicated, it exactly works for my usecase and it assumes that there is a .gitattributes file in the root of the source repository. Nevertheless I hope this will save somebody some time and does give an idea how to get started.

rconde01 commented 2 years ago

Here's a script based on @lstrojny which:

https://gist.github.com/rconde01/ab93a0edddc5b0abf64ad4c8ac5b6ade

Unfortunately there's a bug. The .gitattributes file in my repository doesn't exist at the start. So the first N changes are appending a file change to the commits where LFS migrations are introduced. Then .gitattributes is introduced in the history and then the commit is edited. It is all fine until this point, but edits to .gitattributes for future LFS migrations are lost. In a small test repository with the same structure, it works fine :( Unfortunately I can't share my real repo.

@newren Do you see anything wrong in my script? I think this is getting closer to something you could deliver (and is about 1000x faster than the official migrator).

rconde01 commented 2 years ago

I found multiple issues with the script - I'll update when i have the fixes.

oryandunn commented 1 year ago

@rconde01 I tried using your script (which I think was updated on or after July 11th), and like your repo, mine does not have a .gitattributes (not at the start nor any time), and while the script runs and seems to properly LFS all the files it should, and I get printouts with "Added change to .gitattributes to track additional LFS files.", .gitattributes doesn't ever seem to be created in the repo. Do you have any idea what's going wrong? I've looked over your script, and nothing jumped out at me. In my case, I could probably just get by with git lfs migrate, but I'd really like the replace refs to be generated, and hence why I wanted to use git-filter-repo.

Edit: well, I thought I saw somewhere the file was updated, but now I think it's your original from July 4th. Do you have those fixes for those issues you found?

exaexa commented 11 months ago

Hi all,

just a note about the use of git lfs migrate vs git filter-repo -- I found that git lfs migrate export for some reason rewrites the whole history, even commits that do not have to be rewritten because they have no LFS in the whole history.

Obviously this makes the migration basically impossible if there are other remotes merged into the history etc. that we don't want to (or cannot) rewrite. I guess this might be a great usecase for filter-repo but I have no idea how to implement this now; any documentation in that regard would be very welcome.