Closed stevewgr closed 3 months ago
pre-commit
script done:
#!/bin/bash
echo 'Running pre-commit hook...'
git reset '*-diff.rc' '*-diff.rc2'
git checkout -- '*-diff.rc' '*-diff.rc2'
git clean -df '*-diff.rc' '*-diff.rc2'
rc_files=$(find . -type f \( -name '*.rc' -o -name '*.rc2' \) ! -name '*-diff.rc' ! -name '*-diff.rc2')
for rc_file in $rc_files; do
extension="${rc_file##*.}"
diff_file="${rc_file%.*}-diff.${extension}"
echo '// ** DO NOT TOUCH THIS FILE! AUTO GENERATED FOR DIFFING PURPOSES. **' > "$diff_file"
iconv -f UTF-16LE -t UTF-8 "$rc_file" | awk 'NR==1{sub(/^\xEF\xBB\xBF/,"")}1' >> "$diff_file"
done
git add '*-diff.rc' '*-diff.rc2'
exit 0
I'll continue working tomorrow on integrating it into the config script.
Description
As noted in my comment here: https://github.com/ko4life-net/ko/pull/241#issuecomment-2241447558
it appears that the
.gitattributes
configuration forworking-tree-encoding
is not functioning as expected. This issue is being raised for those interested in addressing it, as well as for those who have previously reported similar problems, including:and also because git's bug reporting sucks (in hope that one of them see's this), reminiscent of outdated practices.
You might wonder why Git attempts to convert from UTF-8 to UTF-16 instead of the other way around, especially when the files are confirmed to be UTF-16LE with BOM. Well, this log message from git sucks, really... digging further into their source code: https://github.com/git/git/commit/107642fe2661236f48b912480e090799e339c512
It is what they call round-trip encoding, as detailed in their documentation on working-tree-encoding and the source code commit. This separated into two parts:
Now for git to render the diffs they need to be stored in the git index as UTF-8, otherwise they're considered as binary and therefore diffing won't work.
.gitattributes
is here for the rescue! oh wait, not actually... When we configure.gitattributes
file with*.rc text working-tree-encoding=UTF-16LE-BOM
, this helps git to know the original encoding of the file, so that internally it can use libiconv to properly transcode from the original encoding to UTF-8 and store it internally in the git index. But when we checkout the file via git, it will take the UTF-8 encoded from the git index and transcode it into the original UTF-16LE-BOM encoding to the git worktree. This explains the log message above in which stage it failed to transcode (encode_to_worktree
).Git's implementation of
working-tree-encoding
seems mostly correct, but issues arise during the checkout stage. Specifically, the tracked file becomes corrupted during the round-trip transcoding process. See my proof of concept (POC) below for details.Why is this an issue for us? Because if someone submits a pull request with changes to any
*.rc
files, we won’t be able to view the diffs directly. Instead, we would need to check the file locally using tools that can render the diff properly. Also we're stricted to use UTF-16 for*.rc
files since that's the official encoding Visual Studio uses in the Resource Editor to properly work with them.Proposal
FAILED: Using UTF-8 for
*.rc
files – Converting*.rc
files from UTF-16LE-BOM to UTF-8 without BOM only works with the VS Resource Editor or Resource Compiler if code page pragmas are defined for each language. I attempted this, and while it worked partially, it struggled with Korean and Chinese characters. The Resource Editor failed to load the file unless I manually removed the problematic characters. Microsoft has used this approach in some of their GitHub projects. If we use only English, no problem, but for the time being we still keep these languages there until all tools are properly translated.FAILED: Using
UTF-16BE-BOM
orUTF-16
without BOM – Both methods failed, as the VS Resource Editor could not load the files, and Git's working-tree-encoding issues persisted as described earlier.SUCCESS: Implement a
pre-commit
hook script – This script will run during commits to duplicate and transcode*.rc
files to UTF-8 as*-diff.rc
, making changes visible in each PR. Since the pre-commit hook is configured locally, it must be deployed through the repository's config script.To Reproduce
Use the following commands to reproduce this issue in Git:
Tasks
pre-commit
hook script.pre-commit
hook.