ko4life-net / ko

Open source development of the game Knight Online. This is a reversed engineered old version of the game aiming to replicate the nostalgic experience we all once had <3
MIT License
52 stars 21 forks source link

Transcode and normalize encoding of all files across the project #241

Closed stevewgr closed 1 month ago

stevewgr commented 1 month ago

Description

Resolves the following issue: https://github.com/ko4life-net/ko/issues/240

This PR introduces the following changes:

Workflow

# 1.
git clone git@github.com:ko4life-net/ko.git

# 2. Run detect encoding to generate the csv files.
python .\ko\script\fix-encoding.py --root-dir .\ko --detect-encoding

# 3. Update and review the guessed encoding in ko-encoding-manual.csv file to match with the correct source encoding.
# Also merge whatever you found faulty in ko-encoding-invalid.csv into ko-encoding-manual.csv as needed.

# 4. Run ko-encoding-manual.csv -> check yourself -> all good? run the main ko-encoding.csv file
python .\ko\script\fix-encoding.py --encoding-file .\ko-encoding-manual.csv
python .\ko\script\fix-encoding.py --encoding-file .\ko-encoding.csv

# 5. Final touch: convert all files to use Unix LF line feed instead Windows CRLF
python .\ko\script\fix-encoding.py --root-dir .\ko --crlf-to-lf

# 6. Run format.ps1

Results

Log files

stevewgr commented 1 month ago

I ran some investigation on why would the new .gitattributes configurations messes up in the wrong way, where it tries to take a perfectly fine UTF-16-LE encoded RC file and rencode it from UTF-8 to UTF-16. This seems like an unexpected behavior from gitattributes, and hence I removed these configs: https://github.com/ko4life-net/ko/pull/241/commits/317d79598a439dbac4c3eada2e59302c005320a2

I initially added them so that when I run from the terminal git diff, I'll be able to see the content, since git doesn't handle well with UTF-16 encoded files and treat them as binary instead of text. However UI based diffing still works in vscode and some other diffing tools, so let's remove these configs from .gitattributes and merge this PR.

Also there are some work around in case someone still want to see the diff of UTF-16 encoded text files by changing the default diff tool: https://stackoverflow.com/questions/777949/can-i-make-git-recognize-a-utf-16-file-as-text

I tested the build on multiple scenarios and they seem to all pass perfectly fine:

Thanks for the review @xGuTeK and @srmeier 🚀