Open source development of the game Knight Online. This is a reversed engineered old version of the game aiming to replicate the nostalgic experience we all once had <3
This issue has been a thorn in my side for far too long. Let's dive into a little story:
Picture this: You're knee-deep in code, making a ton of changes that you haven't staged yet. You're on a roll, but then, out of nowhere, your text editor decides to save the file with a different encoding. Just as you're about to showcase your brilliant work to the world, you review the git diff and see a mess of random changes that have nothing to do with your actual edits. That's when you realize your text editor has messed up the encoding, and you end up yelling at your screen: "Nooooooooooooo... fml."
Here's the deal: Text editors like Visual Studio or Visual Studio Code try to guess the file encoding when opening files. But guessing isn't foolproof. Some libraries do a better job than others, but not all editors use the same libraries. For instance, Visual Studio Code relies on the jschardet library.
When your text editor fails to guess the encoding correctly, saving the file can corrupt the bytes of the text. This is especially problematic for languages with unique characters, like Korean or Chinese comments, because they get interpreted incorrectly.
This has become a significant hassle. Many pull requests end up broken due to incorrect encoding because the author didn't double-check their changes before pushing. Plus, it would be great to see the special characters text, even if they're in a different language, since we can always translate them.
Let's put an end to this encoding nightmare and ensure our code shines as it should!
Open a file containing Korean characters with different encoding, such as ASCII and try saving it. You'll see the characters are now corrupted.
Tasks
To fix this, we can transcode text based files with a more portable and universal encoding such as UTF-8 and UTF-16, depending on the way the files are expected to be.
[x] Transcode Microsoft resource files to UTF-16-LE
[x] Transcode all text based files to UTF-8
[x] Convert all Windows CRLF line feeds to Unix LF
[x] Implement a script that does all of this in automated fashion to make it less prone to human error
Description
This issue has been a thorn in my side for far too long. Let's dive into a little story:
Picture this: You're knee-deep in code, making a ton of changes that you haven't staged yet. You're on a roll, but then, out of nowhere, your text editor decides to save the file with a different encoding. Just as you're about to showcase your brilliant work to the world, you review the git diff and see a mess of random changes that have nothing to do with your actual edits. That's when you realize your text editor has messed up the encoding, and you end up yelling at your screen: "Nooooooooooooo... fml."
Here's the deal: Text editors like Visual Studio or Visual Studio Code try to guess the file encoding when opening files. But guessing isn't foolproof. Some libraries do a better job than others, but not all editors use the same libraries. For instance, Visual Studio Code relies on the jschardet library.
When your text editor fails to guess the encoding correctly, saving the file can corrupt the bytes of the text. This is especially problematic for languages with unique characters, like Korean or Chinese comments, because they get interpreted incorrectly.
In our project, we use the autoGuessEncoding feature: https://github.com/ko4life-net/ko/blob/2.4.0/.vscode/settings.json#L2. But not everyone uses the same editor or system locale, so their editors might behave differently on different machines.
This has become a significant hassle. Many pull requests end up broken due to incorrect encoding because the author didn't double-check their changes before pushing. Plus, it would be great to see the special characters text, even if they're in a different language, since we can always translate them.
Let's put an end to this encoding nightmare and ensure our code shines as it should!
Screenshots
Files
Example of broken commit: https://github.com/ko4life-net/ko/pull/161/commits/cafb59bd0c18805d9fbc427c24c633836b011ded
To Reproduce
Open a file containing Korean characters with different encoding, such as ASCII and try saving it. You'll see the characters are now corrupted.
Tasks
To fix this, we can transcode text based files with a more portable and universal encoding such as UTF-8 and UTF-16, depending on the way the files are expected to be.