int19h / WarBender

WarBender: a Mount & Blade: Warband save game editor
MIT License
26 stars 7 forks source link

Problem in non English M&B save-file reading . #9

Open Nex0817 opened 4 years ago

Nex0817 commented 4 years ago

Hi int19h.

I was very excited when I found this neat save editors, but soon frustrated. It is only works on English version of M&B save file. (I uses Korean version of M&B).

When I searched Google on this issue, I thought it might be due to the encoding problem(UTF-8 BOM or whatever).

And, I found someone's comments : "Yeah it is EASY. to solve that, just use constructor of stream." With provided source code, a little programming skill and a great amount of groundless confidence I tried to fix it but failed.

Trying 5 hours, I couldn't understand even how this program calls LineReader() :(

Hoping you can solve this problem, I attached my save file.(Warband Native) https://drive.google.com/file/d/14eioKtPZXcW0x52MYZDWR8BXxz2jS4hz/view?usp=sharing

Thanks for reading my issue.

P.S if(you.state == busy || !you.is_interest(this_issue) || this_issue.difficulty == extreme || you.is_hate(asian)) you can ignore this issue, and I'm sorry for bothering you. else I hope 1.0.5 VERSION coming soon!

int19h commented 4 years ago

Which file are you getting an error on? Is it the save itself, or one of the .txt files from the module?

The save itself is loaded using the OS locale - this is what you have selected under Time & Language -> Language -> Administrative language settings -> Language for non-Unicode programs.

For .txt files, it uses UTF-8, which is probably wrong - it didn't even occur to me that those might contain anything non-ASCII, to be honest. LineReader is only used for those.

Nex0817 commented 4 years ago

Thank you! I'll try!.

I have addicted to this game, I didn't remember my request. i'm very sorry for late seeing your kind reply.

int19h commented 4 years ago

It's still a valid bug, so lets keep it open, especially as other users might also run into it. I don't have much spare time to deal with this right now, but I'll get to it eventually.

Matihood1 commented 2 years ago

Hello. I know I'm coming here like 1,5 years after the last comment but still, the issue hasn't really been fixed. I'd happily do it myself but I have no idea where to even look in the source code or what to change.

Matihood1 commented 2 years ago

Are you sure the problem lays within reading the .txt files? Because from what I've seen, it's only the names of the characters and parties (or even just the main character's name) that get messed up and those should be stored within the save file itself, judging by the fact that they remain the same after changing the game's language. The .txt files should, indeed, only contain UTF-8 characters (at least for the Native module, which is what both I and the creator of this issue are using).

Regarding the .txt files, would it really not be possible to just change the encoding by changing the way reader instances are created? For example, for PartyDefinitions:

https://github.com/int19h/WarBender/blob/3d87c20db6b4be92ac478ee6e73f5e3adb7ed912/WarBender/Modules/Module.cs#L71-L74

Currently, you're using File.OpenText(String) which, according to the msdocs, just calls the StreamReader(String) constructor. Wouldn't it be possible to just use the StreamReader constructor that takes the encoding:

using (var reader = new StreamReader(Path.Combine(BasePath, "parties.txt"), Encoding.Unicode))

I still don't think this is the source of the problem since it's the names stored in the savegame itself that are displayed incorrectly.

Edit: Or maybe the problem lays not in the fact that the encoding used when reading the save file is wrong but the encoding in the GUI part of the program is wrong. After all, after making an edit to a save file, all special characters are saved correctly, even though they are not displayed as such in the editor.

int19h commented 2 years ago

Sorry, I wasn't paying attention and got it mixed up with the other open issue! Yes, you're right, this one is strictly about the encoding. It's not a GUI issue - once the strings are read, they're stored as UTF-16 in memory (same as any other .NET string), and the GUI is also Unicode throughout.

Strings from the save itself use the OS locale / codepage ("Language for non-Unicode applications", which Warband is):

https://github.com/int19h/WarBender/blob/3d87c20db6b4be92ac478ee6e73f5e3adb7ed912/WarBender/ValueSerializer.cs#L122-L134

So the reason why it round-trips successfully is because the mapping between bytes and char is 1:1 in this case (unlike UTF-8). It decodes incorrectly - but re-encoding that incorrect result gives you the original bytes back.

Anyway, the challenge here is to figure out the correct codepage, since the save itself doesn't contain this info (as far as I know). The non-Unicode locale is usually a decent proxy for this, but it could be made a setting in the app itself. But you're dealing with saves from different game versions, that might have different encodings, that's not ideal, either. It might be best to add an encoding drop-down to the open file dialog when loading a save, like e.g. Notepad does for text files - but I haven't looked into how complicated that is.