shmup / miniboa

An asynchronous, single-threaded, poll-based Telnet server
Apache License 2.0
60 stars 12 forks source link

Crash on some inputs [Python 3.5.2] #16

Open HallowedPoint opened 6 years ago

HallowedPoint commented 6 years ago

Hello,

I've been working on a game using miniboa as the base, and have discovered it gets crashy when it encounters inputs that (assumedly?) cp1252 does not know how to deal with:

Traceback (most recent call last):
  File "urmud.py", line 86, in <module>
    game.mainLoop(gameServer)
  File "/home/x/Documents/Development/python/urmud/working/game.py", line 341, in mainLoop
    pollServer(gameServer)
  File "/home/x/Documents/Development/python/urmud/working/game.py", line 319, in pollServer
    server.poll()
  File "/home/x/Documents/Development/python/urmud/working/miniboa/async.py", line 188, in poll
    self.clients[sock_fileno].socket_recv()
  File "/home/x/Documents/Development/python/urmud/working/miniboa/telnet.py", line 292, in socket_recv
    data = str(self.sock.recv(2048), "cp1252")
  File "/usr/lib/python3.5/encodings/cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1: character maps to <undefined>

So far I've discovered this works with ⁒ (u2052) and ⁄ (u2044), but considering the sheer size of unicode I'm just assuming there are more.

I'm actually using this to learn Python and I'm not really sure what the best way to handle this would be, so I figured I'd point it in the direction of the professionals.

And I did test it with the example chat_demo.py to be sure it wasn't something I broke. I was quite surprised to discover that it was not. :)

shmup commented 6 years ago

Ha, thanks for pointing this out. I'm going to try looking into this and thinking about it.

shmup commented 6 years ago

@gergelypolonkai was curious if you had any input here.

It seems some unicode characters do not map to cp1252. I'm unsure of the best way to go about this. Do I just catch the exception, but then what? Just ignore the input?

Some input here on which bytes will error https://stackoverflow.com/a/26330256

Like, I can convert from utf-8 and if fixes it but I'm unsure if there was a reason, like, for purity sake it was using cp1252? I don't grok encoding probably. :)

https://github.com/shmup/miniboa/commit/09e90ff39d6c11b8166ab2599097a68897ca1d4a

shmup commented 6 years ago

No that is bad, I'm not doing the above, heh.

gergelypolonkai commented 6 years ago

Well, that is a strange problem and I don’t think there is a good solution to it.

The linked SO answer lists the character codes that have no actual characters assigned (like the 0x81 mentioned by @HallowedPoint). When assuming the input is CP1252, you might filter such bytes out before processing it. However, that solves such problems for this specific code page; I don’t know if there are others with similar “missing” code points.

<rant>Also, it’s strange to se CP[anything] nowadays. Is there really someone out there who is not using UTF-8 nowadays? </rant>

shmup commented 6 years ago

Fixed with https://github.com/shmup/miniboa/commit/09e90ff39d6c11b8166ab2599097a68897ca1d4a but we'll see if this gets backlash

@gergelypolonkai yeah heh re: your rant, I just said screw it and will see if anyone rants back at my choice. I doubt it'll happen

shmup commented 6 years ago

I was just thinking, maybe I should restore the previous encoding, cp1252, (I still don't know why), and let the user override the encoding when instantiating a TelnetServer instance?

gergelypolonkai commented 6 years ago

Despite you restore it or not, I really like the idea of allowing charset selection in the constructor. It makes the whole thing much more flexible.

My the way, is there a reason to use cp1252? Or is it because itʼs “legacy”?

shmup commented 6 years ago

@gergelypolonkai yeah that's exactly what I was thinking, based on a change you did in one of your PR's. That's exactly what I should do. I'll uh, default to utf-8 and let you set it to cp1252 in the constructor. And yes, I believe the idea was in the spirit of some romantic masturbatory old unix thing.

Really that's just a guess though :man_shrugging:

shmup commented 6 years ago

Well I think my change exposed a problem, re: https://github.com/shmup/miniboa/issues/22

I'm being a bad software maintainer and rushing things. I'll certainly end up reverting, I think.

HallowedPoint commented 5 years ago

Hi,

I have started poking at my project again recently, and while doing so it occured to me that I never shared my solution to this that I did a couple weeks after my initial report.

I simply added "errors=ignore" to the str() call. :P

data = str(self.sock.recv(2048), "cp1252", errors='ignore')

This seems to just straight up drop anything it doesn't like, but considering this is user input for a MUD, I am not in the least concerned about that:

wiz ⁒ (u2052) and ⁄ (u2044) [wiz/Testdude]: (u2052) and (u2044)

Now, I am only like 1% more experienced at Python than I was when I wrote the first message, so I have absolutely zero idea if this has any potential for mayhem, and I am sure there is a more proper solution... but it ain't crashing anymore, so it seems like progress to me. :P

shmup commented 5 years ago

I'm sorry I haven't really paid attention to this much more. I should. Maybe someone else will, heh

shmup commented 1 year ago

Ok, I went ahead and took @gergelypolonkai suggestion of specifying the encoding in the TelnetServer constructor.

You can now TelnetServer(encoding='utf-8') to resolve this issue. I have a test capturing it, and hope this is a good resolution!

Give it a shot if you're still messing with this library, @HallowedPoint