newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.55k stars 708 forks source link

Question: Is it possible for git filter-repo to squash file changes to the initial commit that added the file #492

Closed Morph1984 closed 6 hours ago

Morph1984 commented 1 year ago

I have a repo which has a thousands of pngs that were poorly compressed added across many multiple commits. As such, I pushed a commit which aggressively recompresses all the pngs. This action however increased the overall size of the repository as deltas are tracked. Therefore, I'd like to squash these changes to the initial commit(s) that added them so it is as-if the files were already compressed when they were added, thus removing any file deltas.

newren commented 3 months ago

Sure. One way to do it is that if you have a program that recompresses a given png, perhaps invoked by a command line like

   recompress --png --level 8 ${PNG_FILE}

then you could run

   lint-history --relevant 'return filename.endswith(b".png")' recompress --png --level 8

and it'd go through history modifying all the png files by passing them to the recompress program and replacing their contents with whatever that command provided.

A second, independent way to do it would be to get all the object names of the blobs in question. You should be able to do that from running git log -1 --raw --no-abbrev ${COMMIT_WHERE_YOU_COMPRESSED_PNGS}. That would show output like:

:100755 100755 edf570fde099c0705432a389b96cb86489beda09 9cce52ae0806d695956dcf662cd74b497eaa7b12 M      foo.png
:100755 100755 644f7c55e1a88a29779dc86b9ff92f512bf9bc11 88b02e9e45c0a62db2f1751b6c065b0c2e538820 M      bar.png

then, using those object names and filenames, you could make a commit callback that just modifies the values, e.g.:

git filter-repo --commit-callback '
    for change in commit.file_changes:
        if change.filename == b"foo.png" and change.blob_id == b"edf570fde099c0705432a389b96cb86489beda09":
            change.blob_id = b"9cce52ae0806d695956dcf662cd74b497eaa7b12"
        if change.filename == b"bar.png" and change.blob_id == b"644f7c55e1a88a29779dc86b9ff92f512bf9bc11":
            change.blob_id = b"88b02e9e45c0a62db2f1751b6c065b0c2e538820"
'

Naturally, you'd have a much longer list than this where I only have two example if-statements and reassignments, but it shows the idea. This second method also relies on the fact that you didn't specify any filtering commands that would need the blob contents (such as a --blob-callback or --replace-text or something), but that seems safe since any blob filtering would kind of mess up what you're trying to do here anyway. (And if you are trying to do blob filtering of things other than pngs, you could just call filter-repo twice, once to fix up the pngs as shown here, and then the second invocation to make the other changes.)

Anyway, does that help? (Or would it, if I had gotten back to you about a year sooner?)

newren commented 6 hours ago

I'm pretty sure my answer above provides what you need for your usecase. Sorry that it came so delayed. I thought it was a really interesting usecase, though, and included it in my sampling of answers to user-filed questions at https://github.com/newren/git-filter-repo/blob/main/Documentation/examples-from-user-filed-issues.md#Replacing-pngs-with-compressed-alternative. Thanks for filing this; if you have further questions for this usecase not answered above, feel free to reopen and let me know what I missed for your usecase.