newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.55k stars 708 forks source link

Update submodule hashes? #537

Closed tbleher closed 3 months ago

tbleher commented 10 months ago

I have a repo with submodules in it. Now one of the repos will need to be converted to Git LFS. This will change all its commit hashes. Git LFS provides a file with a mapping from old commit id to new commit id. How would I use git-filter-repo to update all submodule references in the parent repo to the new commit ids?

https://stackoverflow.com/questions/73853726/git-filter-branch-or-filter-repo-to-update-submodule-gitlink has the following code for git filter-branch:

git filter-branch --setup '
        declare -A newsha
        while read old new; do newsha[$old]=$new; done <shamap
'                 --index-filter '
        if oldsha=`git rev-parse :submodulepath 2>&-`
        then git update-index --cacheinfo 160000,${newsha[$oldsha]-$oldsha},submodulepath
        fi
'

Unfortunately I found no way to do the same thing with git-filter-repo. Is this doable? Would I need a --blob-callback?

newren commented 3 months ago

No, you wouldn't want a blob-callback, you'd want a commit-callback, because you want to modify the references to paths within a commit. Let's say, for example that your submodule is located at PATH/TO/SUBMODULE, and that the table of commit id changes was

old                                      new
edf570fde099c0705432a389b96cb86489beda09 9cce52ae0806d695956dcf662cd74b497eaa7b12
644f7c55e1a88a29779dc86b9ff92f512bf9bc11 88b02e9e45c0a62db2f1751b6c065b0c2e538820

Then you could run a command like

git filter-repo --commit-callback '
    for change in commit.file_changes:
        if change.filename == b"PATH/TO/SUBMODULE":
            if change.blob_id == b"edf570fde099c0705432a389b96cb86489beda09":
                change.blob_id = b"9cce52ae0806d695956dcf662cd74b497eaa7b12"
            if change.blob_id == b"644f7c55e1a88a29779dc86b9ff92f512bf9bc11":
                change.blob_id = b"88b02e9e45c0a62db2f1751b6c065b0c2e538820"
'

Obviously, you'll have different hashes in your shamap, and a different pathname that PATH/TO/SUBMODULE. And if you have enough entries in your table you might want to consider reading it in and create a dict and then do a lookup instead of adding a bunch more "if change.blob_id" arms.

I'll assume this gets you what you need, but feel free to reopen if that's not clear or you have follow-up questions.