newren / git-filter-repo

Quickly rewrite git repository history (filter-branch replacement)
Other
8.57k stars 710 forks source link

How can I perform git add --renormalize in all history #375

Open rcmarques3 opened 2 years ago

rcmarques3 commented 2 years ago

I'm aiming to handle LF conversion issues (fixing a repo where text files have wrong CRLF) and have examined issue #122. The approach, however, is not exactly what I need, because the lint-history contrib script will blindly apply the dos2unix conversion to whatever blobs are heuristically considered as text. If I have a file/blob that I know is binary, and have marked as such in .gitattributes, it won't use that metadata to avoid performing the transformation on that blob.

What I'm looking for is this kind of logic:

It's this 2nd step I don't know how how to achieve with git-filter-repo. With git filter-branch, it would be perhaps something along the lines of git filter-branch --prune-empty --tree-filter 'git add --renormalize .' -- --all? :confused:

Can anyone point me to the right direction?

benblo commented 1 year ago

I'm interested in this as well, did you ever figure out a solution?

NickCrews commented 1 year ago

I'm interested in this too. Any help appreciated.

Mike4Online commented 10 months ago

I need this as well.

I recently migrated a Subversion repo with 28000 commits to Git via git svn clone run from a Windows computer. There was no .gitattributes file in the Subversion repo, and even if I had added one to Subversion I am not sure git svn clone would have honored it. I did have SVN properties for EOL and mime-type set properly in Subversion, including charset values for certain text files having UTF-16LE with BOM encoding.

After the migration to Git I managed to establish a .gitattributes file to account for EOL and text encodings of various file types, along with a list of explicit text encoding and EOL settings for particular files. But all of the text files referenced by my .gitattributes file were added to Git prior to the creation of .gitattributes. Many of these text files are stored internally by Git with CR-LF end-of-lines, rather than LF. And as a result I see 2000 files appearing in my working tree as 'modified' that I haven't touched: Git has simply altered the EOLs automatically. This tends to happen on Linux and macOS computers; however, I have seen it happen on Windows computers as well.

If I renormalize EOLs and commit them, then I lose blame history, i.e. git blame thinks my recent commit changed every line in the file, and shows no other historical info. If I do nothing with these files, then the next time a developer makes a change to one or more of these files, their commit will show new EOLs on every line -- they'll again lose blame history, and there will be no way to see or review the specific code changes which were just committed.

I assume this is a common issue, especially for those migrating from Subversion. Being able to run git-filter-repo to change the way text file EOLs are stored without changing history would greatly help with diffs and blames. Ideally, it would do something similar to git add --renormalize . -- re-applying the settings in .gitattributes to each text file to ensure both EOL and text encoding/BOM are correct, and then recommitting the changes without altering the commit history.

Much of the mess may be due to limitations of the git svn clone command. This command could be enhanced to honor Subversion EOL and mime-type properties as they relate to text files and text encodings, generating a .gitattributes file on-the-fly so files would be stored with proper EOLs and text encodings from the beginning. But that request is beyond the scope of this open issue.

As a temporary work-around, I can ignore EOL changes with git blame -w, and I can show diffs in BitBucket Server while hiding whitespace changes. I.E. I can do git add --renormalize . and commit the changes, and still get reasonable diffs and blames. But not in every scenario.