Telnet removes diacritical characters

jimstorch / miniboa

Automatically exported from code.google.com/p/miniboa

Apache License 2.0

1 stars 1 forks source link

Telnet removes diacritical characters #3

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

In function
def _recv_byte(self, byte):
in telnet module, there is such condition:
## Filter out non-printing characters
if (byte >= ' ' and byte <= '~') or byte == '\n':

Unfortunately this removes not only non-printing characters but also
characters specific for different languages. 
In polish language there are 3 possibilities: iso, win and utf. 
So I decided to switch off this checking but I am not sure if or how it
could harm mud.
Other way would be to provide codes for all diacritical characters but if
mud allows to different languages it could be difficult.

Original issue reported on code.google.com by mich.mierzwa@gmail.com on 22 Dec 2009 at 12:28

GoogleCodeExporter commented 9 years ago

I knew that would be an issue for some languages and, to be honest, I'm not 
sure how
UTF-8 and 16 (which are multi-byte encodings) will interact with the Telnet 
protocol
that was developed back in the early 1980's and was design to transfer 7 bit 
ASCII.

I'll look into it.

http://www.faqs.org/rfcs/rfc854.html

Original comment by jimstorch@gmail.com on 22 Dec 2009 at 1:54

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

if you just do 
if True or (byte >= ' ' and byte <= '~') or byte == '\n':
and let it through then everything works fine. I can interact with telnet 
client in 
unicode and other older schemas both way. Well... at least under Windows. Under 
Linux I am in trouble right now with .encode .decode.

Original comment by mich.mierzwa@gmail.com on 23 Dec 2009 at 6:44

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Ok, now it works under linux too.

First of all you cannot filter out characters like this:
if (byte >= ' ' and byte <= '~') or byte == '\n':
it is obvious.

Second thing. Do not try to send unicode string u"Łóżko". You have to send 
"Łóżko" 
instead, even if in fact they are exactly the same.
The difference can be seen when you do
a = u"Łóżko"
b = "Łóżko"
for c in a: print c
for c in b: print c
or just
print len(a)
print len(b)
In the second case (b), string will have length of 8 characters. It is because 
python treat this as a sequence of bytes. And it is what we want as telnet does 
not 
know (and does not have to) what is transmited and at user side client already 
know 
(because he was asked to) how to decode this stream.

I hope this could help and save time anyone who wish to introduce other 
languages to 
mud.

Original comment by mich.mierzwa@gmail.com on 23 Dec 2009 at 9:57

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

This is very helpful, thank you.

Original comment by jimstorch@gmail.com on 23 Dec 2009 at 2:28

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I've disabled the printable character checking completely.

Original comment by jimstorch@gmail.com on 29 Dec 2009 at 3:40

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Original comment by jimstorch@gmail.com on 2 Feb 2010 at 1:09

Changed state: Fixed
Added labels: ****
Removed labels: ****