selectel / pyte

Simple VTXXX-compatible linux terminal emulator
http://pyte.readthedocs.org/
GNU Lesser General Public License v3.0
658 stars 102 forks source link

How do you specify a different encoding? #118

Open eight04 opened 6 years ago

eight04 commented 6 years ago

I want to feed some Big5-UAO encoded data. Since there is no encoding parameter (or something like that), I tried using ByteStream:

stream = ByteStream(screen)
stream.select_other_charset("@")
stream.feed(bytes_object)

However, after checking the source code, it seems that this setup equals to:

stream = Stream(screen)
stream.feed(bytes_object.decode("latin-1"))

This method doesn't work because the bytes of Big5-UAO encoded string may contain control characters like \x9d, and match_text failed to match the entire string: https://github.com/selectel/pyte/blob/676610b43954b644c05823371df6daf87caafdad/pyte/streams.py#L132-L135 Here I generated a list of unicode character which contains control characters if encoded in Big5-UAO: https://gist.github.com/eight04/3de731b7300a6b5036e082f801e2e3e9

How about encoding the bytes into unicode string with Big5-UAO before passing it to stream.feed?

We can't. In our usecase, we need a special feature called "雙色字". It colors a double width charater with two different colors. For example:

As a result, we can't decode the bytes before the escape code is parsed.


May we can add a flag to disable C1 controls in Stream.feed parser?

eight04 commented 6 years ago

I found another problem that the bytes sequence may contain unprintable characters

wcwidth think these characters are unprintable: https://github.com/jquast/wcwidth/blob/c71459ea91af86f3bbcdac2c8ed5e7773da2d848/wcwidth/wcwidth.py#L175-L176

When pyte receives an unprintable character, it doesn't draw it on the buffer: https://github.com/selectel/pyte/blob/676610b43954b644c05823371df6daf87caafdad/pyte/screens.py#L522-L523

As a result, following characters would never be drawn: https://gist.github.com/eight04/dd7511c289d83932d18d17e21734bab3


We need a flag to put unprintable bytes to the buffer.