Re-encode codebase to UTF-8

ko4life-net / ko

Open source development of the game Knight Online. This is a reversed engineered old version of the game aiming to replicate the nostalgic experience we all once had <3

https://github.com/ko4life-net/ko

MIT License

55 stars 22 forks source link

Re-encode codebase to UTF-8 #237

Closed twostars closed 4 months ago

twostars commented 4 months ago

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

[x] Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
[x] Make sure you are making a pull request against the canary branch (left side). Also you should start your branch off our canary.
[x] Check the commit's or even all commits' message styles matches our requested structure.
[x] Check your code additions will fail neither code linting checks nor unit test.

Description

Codebase reencoded to UTF-8. Resources left untouched (because they're expected to be whichever encoding they're set to in the resource file, so they shouldn't be touched), and builds untouched, so it'll just generate UTF-8 strings for output now.

This could be corrected with a build flag if so desired for consistency, but I assume there's no desire to do that here.

💔Thank you!

twostars commented 4 months ago

Manually reviewed and made adjustments for each area of the repo. Initially manually corrected the ZipArchive stuff (mixed Polish & Korean).

Reviewed N3Base/ - N3GlobalEffectMng.h/.cpp and N3UIList.h needed to be manually corrected Reviewed game/ - UICharacterCreate.h & UIRepairTooltipDlg.h neededto be manually corrected Reviewed server/ - Ebenezer/IOCPort.h, LoginServer/IOCPort.h, and another polish one I missed - LoginServer/CentralDir.cpp Reviewed tool/ - N3CE/stdafx.h, N3ME/DlgBar.cpp, N3ME/DlgEditWarp.cpp, N3ME/stdafx.h needed to be manually corrected

No other changes. Notably it did re-encode common\, but no changes there, so nothing to review.

twostars commented 4 months ago

As a note: oddly the diff page shows a few of the files (e.g. BitmapFile.h, BitmapFile.cpp) as mangled, but they're fine. You can check the files directly; they're UTF-8 encoded and even show up otherwise correctly on their page (though that's true for CP949 as well).

stevewgr commented 4 months ago

Hi @twostars, thanks for the PR. I'm commenting to explain why we closed this one and to provide some insights as an appreciation for your time and efforts. I apologize for repeating what was said on Discord, but since you deleted your messages, let me recap:

The PR is incomplete, as not all files were updated to the correct encoding.
Your process is undocumented, making it difficult for me as the maintainer to accept or review such a crucial change. This PR includes 577 changed files with 15,774 additions and 15,768 deletions, which could introduce issues if missed during a manual review. No tools were mentioned for the conversion process, which is important given the inconsistencies among various tools.
You mentioned doing the conversion manually, which is prone to human error. Additionally, you forcefully mass re-encoded all files to a specific encoding and then corrected them manually, which isn't the best approach in my opinion.
There were some incorrect assumptions on your part, insisting on something that isn't necessarily accurate.
Lack of communication: Ideally, we would have discussed this in a GitHub issue before submitting such a large Pull Request. Especially since you were aware from Discord that I was working on it. Your proactive effort is appreciated, but communication is key for team collaboration and could have saved us both time.

Resources left untouched (because they're expected to be whichever encoding they're set to in the resource file, so they shouldn't be touched), and builds untouched, so it'll just generate UTF-8 strings for output now.

This is incorrect. Since Visual Studio 2010, resource files are UTF-16 Little Endian (LE) encoded with Byte Order Mark (BOM). This change was made by Microsoft to allow editors to load multiple Unicode characters from different character sets/code pages/encoding properly. Note that the current resource (*.rc) files in this project were created before 2002. Hence, some are UTF-16 Big Endian (BE) encoded and some are detected as Big5, which in this context is CP950 (Windows 950) Microsoft encoding, an extended version of Big5 encoding with more character sets for backward compatibility. Please refer to Wikipedia before making assumptions: https://en.wikipedia.org/wiki/Code_page_950

Manually reviewed and made adjustments for each area of the repo. Initially manually corrected the ZipArchive stuff (mixed Polish & Korean).

Reviewed N3Base/ - N3GlobalEffectMng.h/.cpp and N3UIList.h needed to be manually corrected Reviewed game/ - UICharacterCreate.h & UIRepairTooltipDlg.h neededto be manually corrected Reviewed server/ - Ebenezer/IOCPort.h, LoginServer/IOCPort.h, and another polish one I missed - LoginServer/CentralDir.cpp Reviewed tool/ - N3CE/stdafx.h, N3ME/DlgBar.cpp, N3ME/DlgEditWarp.cpp, N3ME/stdafx.h needed to be manually corrected

No other changes. Notably it did re-encode common, but no changes there, so nothing to review.

Not sure why you had issues with them. Perhaps it's related to the mass re-encoding approach I mentioned earlier. I only had to correct src\server\LoginServer\ZipArchive.cpp, which contained Polish and Korean characters in the same file. It was as simple as running my script to update encoding, opening a previous version of that file with CP1250 encoding, and manually copying the Polish characters to the re-encoded file.

As a note: oddly the diff page shows a few of the files (e.g. BitmapFile.h, BitmapFile.cpp) as mangled, but they're fine. You can check the files directly; they're UTF-8 encoded and even show up otherwise correctly on their page (though that's true for CP949 as well).

I didn't encounter this issue in my PR: https://github.com/ko4life-net/ko/pull/241. This could be related to your encoding approach, but it doesn't matter at this point.

Thank you again for your effort and understanding.

twostars commented 4 months ago

"I'm commenting to explain why we closed this one and to provide some insights as an appreciation for your time and efforts." I closed it. Github even shows you this. Still lying and manipulating things I see.

My entire point with the .rc files is that you shouldn't just blindly re-encode them with a script when they contain multiple encodings. It was literally made directly in response to your script approach.

Obviously me mentioning ZipArchive.cpp as a stand-out case because it contains mixed encodings is why I had to manually touch it. Like you just said you did. I don't know why you're so obnoxious about it.

The fact that you have to keep twisting everything I say to try and sound intellectual tells me enough about your character that I want nothing to do with you or any of your so called community that you treat so very well.

UTengine commented 4 months ago

On a side note why are we still depending on mfc/afx Korean res includes if it can be substituted with winres?

stevewgr commented 4 months ago

On a side note why are we still depending on mfc/afx Korean res includes if it can be substituted with winres?

Isn't winres for .NET applications? I didn't know they could be substituted with rc files and would be curious to see a proof of concept with one of the tools in the project if you know how.

twostars commented 4 months ago

On a side note why are we still depending on mfc/afx Korean res includes if it can be substituted with winres?

You should make this a separate issue.

UTengine commented 4 months ago

I had set it up once long time ago because the vs didn't support mfc or something, so i replaced it with include windows.h and winres instead. I don't have it anymore Something along these lines https://community.intel.com/t5/Intel-Fortran-Compiler/AFXRES-H-missing/m-p/1345249

UTengine commented 4 months ago

On a side note why are we still depending on mfc/afx Korean res includes if it can be substituted with winres?

You should make this a separate issue.

stevewgr commented 4 months ago

My entire point with the .rc files is that you shouldn't just blindly re-encode them with a script when they contain multiple encodings. It was literally made directly in response to your script approach.

"shouldn't", who said so and based on what? again with the wrong assumptions. The script doesn't blindly re-encodes them, it handles them differently.

stevewgr commented 4 months ago

Obviously me mentioning ZipArchive.cpp as a stand-out case because it contains mixed encodings is why I had to manually touch it. Like you just said you did. I don't know why you're so obnoxious about it.

As far as I can see from your previous comment, it was not just ZipArchive.cpp for you, rather it sounds like multiple files you had to fix their encoding manually. Am I missing something? 🤷 :

Manually reviewed and made adjustments for each area of the repo. Initially manually corrected the ZipArchive stuff (mixed Polish & Korean). Reviewed N3Base/ - N3GlobalEffectMng.h/.cpp and N3UIList.h needed to be manually corrected Reviewed game/ - UICharacterCreate.h & UIRepairTooltipDlg.h neededto be manually corrected Reviewed server/ - Ebenezer/IOCPort.h, LoginServer/IOCPort.h, and another polish one I missed - LoginServer/CentralDir.cpp Reviewed tool/ - N3CE/stdafx.h, N3ME/DlgBar.cpp, N3ME/DlgEditWarp.cpp, N3ME/stdafx.h needed to be manually corrected No other changes. Notably it did re-encode common, but no changes there, so nothing to review.

Not sure why you had issues with them. Perhaps it's related to the mass re-encoding approach I mentioned earlier. I only had to correct src\server\LoginServer\ZipArchive.cpp, which contained Polish and Korean characters in the same file. It was as simple as running my script to update encoding, opening a previous version of that file with CP1250 encoding, and manually copying the Polish characters to the re-encoded file.

twostars commented 4 months ago

My entire point with the .rc files is that you shouldn't just blindly re-encode them with a script when they contain multiple encodings. It was literally made directly in response to your script approach.

"shouldn't", who said so and based on what? again with the wrong assumptions. The script doesn't blindly re-encodes them, it handles them differently.

You're making the wrong argument is the thing. I don't disagree with you but you're going off about unrelated things. A warning based on the sample that the .rc script shouldn't be blindly re-encoded by a script isn't what you're saying at all. It's just telling you to be aware. If you were planning on addressing that with the script, then sure, say that, don't twist my words.

Because what you actually said was I was wrong and making assumptions, and that the file is only CP950. Which it's not, and I showed you in 5 screenshots that it's not. If you had blindly re-encoded assuming 1 single encoding, you'd have problems. So it was a warning to be careful, that you twisted, and twisted, and cried about trying to sound superior about some nonsense argument that was totally unrelated.

While you're lying, manipulating and trying to make yourself sound superior for some nonsense reason over text encodings, of all things, you're just wasting your time when you could be working on actual useful changes for the repo, not just the niceties to avoid PR issues which shouldn't have taken you almost a week to get done in the first place.

twostars commented 4 months ago

I'll reply to it here, but:

Kolin made it clear on discord that you're doing this for business to get leads, so if you don't want to contribute to the project, then why bother.

That is straight up a lie. Doing "this"; this thing that is trying to WARN you about potential issues when using a script only for you to cry about me being wrong about everything and not ever bothering to clarify, while I go to the trouble of clarifying everything I'm trying to say.... for business? Leads? From who? For what? You can ask the few people (e.g. Sword) that bothered to ask me to work for their server; I've rejected them all because I have too many ongoing projects. I don't need the business.

Are you even hearing yourself? And you're trying to make me out to be crazy?

nikos32 commented 4 months ago

My entire point with the .rc files is that you shouldn't just blindly re-encode them with a script when they contain multiple encodings. It was literally made directly in response to your script approach.

"shouldn't", who said so and based on what? again with the wrong assumptions. The script doesn't blindly re-encodes them, it handles them differently.

twostars spent about 2 hours in a single day to get the PR done, while you spent more than 3 days without a clear record of the exact time invested, despite an encoding issue existing for almost 2 years too. Time is precious, and it seems you're making a big deal out of nothing, failing to see how efficient twostars's PR was in saving time. I suggest you calm down and focus on your project instead of manipulating things and wasting both of your time, especially since your project is years behind with no significant improvements whatsoever compared to the leaked sources.

UTengine commented 4 months ago

My entire point with the .rc files is that you shouldn't just blindly re-encode them with a script when they contain multiple encodings. It was literally made directly in response to your script approach.

"shouldn't", who said so and based on what? again with the wrong assumptions. The script doesn't blindly re-encodes them, it handles them differently.

twostars spent about 2 hours in a single day to get the PR done, while you spent more than 3 days without a clear record of the exact time invested, despite an encoding issue existing for almost 2 years too. Time is precious, and it seems you're making a big deal out of nothing, failing to see how efficient twostars's PR was in saving time. I suggest you calm down and focus on your project instead of manipulating things and wasting both of your time, especially since your project is years behind with no significant improvements whatsoever compared to the leaked sources.

images (2)

xGuTeK commented 4 months ago

My entire point with the .rc files is that you shouldn't just blindly re-encode them with a script when they contain multiple encodings. It was literally made directly in response to your script approach.

"shouldn't", who said so and based on what? again with the wrong assumptions. The script doesn't blindly re-encodes them, it handles them differently.

twostars spent about 2 hours in a single day to get the PR done, while you spent more than 3 days without a clear record of the exact time invested, despite an encoding issue existing for almost 2 years too. Time is precious, and it seems you're making a big deal out of nothing, failing to see how efficient twostars's PR was in saving time. I suggest you calm down and focus on your project instead of manipulating things and wasting both of your time, especially since your project is years behind with no significant improvements whatsoever compared to the leaked sources.

Nikos, we dedicate as much time to this as we have available sometimes it might be 5 minutes, sometimes an hour, and sometimes even more during the week. We treat it as a hobby, and we're not in a hurry with the work. Twostars had different intentions. At first, I thought he wanted to help, but then he started the usual drama that's been in this community for years. Later, he seemed to want to help again, but then started drama once more. The fact that many people wanted something from him in the past or leaked his stuff doesn't concern me. I didn't leak his things or claim them as my own. I'm also not interested in how much he contributed to this community before. I didn't use his things, and I also tried to help people because I've been in this community since 2005. Maybe the way people treated him changed him, and now he can't collaborate or communicate with people normally. We don't want that kind of attitude in this project, even though he has a lot of knowledge about KO and programming. That's why we banned him from the repository.

stevewgr commented 4 months ago

My entire point with the .rc files is that you shouldn't just blindly re-encode them with a script when they contain multiple encodings. It was literally made directly in response to your script approach.

"shouldn't", who said so and based on what? again with the wrong assumptions. The script doesn't blindly re-encodes them, it handles them differently.

twostars spent about 2 hours in a single day to get the PR done, while you spent more than 3 days without a clear record of the exact time invested, despite an encoding issue existing for almost 2 years too. Time is precious, and it seems you're making a big deal out of nothing, failing to see how efficient twostars's PR was in saving time. I suggest you calm down and focus on your project instead of manipulating things and wasting both of your time, especially since your project is years behind with no significant improvements whatsoever compared to the leaked sources.

@nikos32, who are you? I don't recognize you, and I've never seen you on Discord or contributing before. Oh wait, now I remember -- you own a KO private server, right? @twostars mentioned you to me before. You're his client, paying him to patch up your server so you can profit from it. That sums it up for me, and I don't feel the need to explain how I manage my time to you. I hope @twostars will give you a discount on your next purchase with the efforts you put in commenting here.

You're making the wrong argument is the thing. I don't disagree with you but you're going off about unrelated things. A warning based on the sample that the .rc script shouldn't be blindly re-encoded by a script isn't what you're saying at all. It's just telling you to be aware. If you were planning on addressing that with the script, then sure, say that, don't twist my words.

I did say that, from the beginning and if you don't know how to read, that's your own problem.

Because what you actually said was I was wrong and making assumptions, and that the file is only CP950. Which it's not, and I showed you in 5 screenshots that it's not. If you had blindly re-encoded assuming 1 single encoding, you'd have problems. So it was a warning to be careful, that you twisted, and twisted, and cried about trying to sound superior about some nonsense argument that was totally unrelated.

I didn't ask for your help or need your warnings. I understand how encoding and resource files work, as is evident from my PR. So, thanks, but no thanks. Claiming to be "just wanting to help" while clearly seeking power, attention, and financial gain from KO is disingenuous.

While you're lying, manipulating and trying to make yourself sound superior for some nonsense reason over text encodings, of all things, you're just wasting your time when you could be working on actual useful changes for the repo, not just the niceties to avoid PR issues which shouldn't have taken you almost a week to get done in the first place.

And now again you trying to tell me how to manage my time or the repo. Useless comment.