py-mine / mcstatus

A Python library for checking the status of Minecraft servers
https://mcstatus.readthedocs.io
Apache License 2.0
480 stars 37 forks source link

Decode ISO 8859-1 into UTF-8 #187

Open py-mine-bot opened 2 years ago

py-mine-bot commented 2 years ago

kevinkjt2000 Authored by kevinkjt2000


Give examples for decoding ISO 8859-1 into UTF-8 (otherwise, people are surprised with garbage characters

Why are we not doing this automatically? Is there some benefit in keeping it in ISO 8859-1? I feel like this will only cause more issues to appear over time as people won't know what's going on since it's non-trivial to figure out. Even when documented, it would just clutter the docs when we could easily do this on the library side.

Originally posted by @ItsDrike in https://github.com/Dinnerbone/mcstatus/issues/136#issuecomment-1005979529

PerchunPak commented 2 years ago

This will be a breaking change

>>> motd = "여보세요".encode("utf-8").decode("iso-8859-1")
'ì\x97¬ë³´ì\x84¸ì\x9a\x94'
>>> motd = motd.encode("iso-8859-1").decode("utf-8")
'여보세요'
>>> motd.encode("iso-8859-1").decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)

Two times encode and decode will result in an error.

PerchunPak commented 2 years ago

Nevermind, this breaking change already made #192.

PerchunPak commented 2 years ago

Can't actually find new working workaround after changes which was introduced in #192. Never mind, the old workaround still works.

PerchunPak commented 1 year ago

Reproduced! Write some non ISO-8859-1 symbols in server.properties and save it with UTF-8 encoding. Then start server with enabled query, and you have random symbols in mcstatus received answer.

So it's only possible with query and all fields there (like players or map).

P.S. I tried Paper and Vanilla cores. And Bedrock Vanilla (unaffected).

PerchunPak commented 1 year ago

We can add fix_encoding method to all strings in QueryResponse, so no breaking change would be.

class StringWithFixEncoding(str):
    def fix_encoding(self) -> str:
        return self.encode("iso-8859-1").decode("utf-8")
ItsDrike commented 1 year ago

How to minecraft clients handle this? Do they actually perform reencoding too? If this is just the case of some servers using UTF8 when minecraft motd doesn't support this, I don't think we should be supporting it either. A note about this in FAQ section of the wiki is more than sufficient in that case. If minecraft clients do actually perform this kind of reencoding, only then I'd consider changing the behavior of mcstatus here

PerchunPak commented 1 year ago

How to minecraft clients handle this? Do they actually perform reencoding too? If this is just the case of some servers using UTF8 when minecraft motd doesn't support this, I don't think we should be supporting it either. A note about this in FAQ section of the wiki is more than sufficient in that case. If minecraft clients do actually perform this kind of reencoding, only then I'd consider changing the behavior of mcstatus here

Minecraft clients do not use query protocol at all (ref).

But if you meant Minecraft servers - then yes, they see that server.properties is in UTF-8 encoding, and just resave it with ISO one, so UTF symbols will be transformed to UTF escape sequences.

PerchunPak commented 1 year ago

Can it be fixed by #504? As the PR mentions this question in FAQ here.