Unicode TTY with /dev/vcsu

chi-lambda commented 4 years ago

Apparently there's a vcsu device in Linux, that works like vcs but returns UTF-32. I haven't found much documentation though. This seems to make Unicode actually work. Since we no longer need to specify an encoding, I've also removed the encoding parameter.

joukos commented 4 years ago

Yes, this was mentioned somewhere at the bottom of the README as a nice thing to have, but at the time of writing it wasn't in the mainline kernel. I guess now it is?

chi-lambda commented 4 years ago

Oh yeah, now that you mention it! It's definitely present in Raspbian Buster (this is on a pretty fresh install), which I guess is the only version we have to reasonably support?

joukos commented 4 years ago

So to recap, this vcsu device doesn't have attributes, but perhaps eventually there will be a vcsua (quote from https://github.com/torvalds/linux/blob/master/drivers/tty/vt/vc_screen.c):

/dev/vcsuaN: same idea as /dev/vcsaN for unicode (not yet implemented).

The attributes weren't currently used anyway but perhaps would be nice at some point if we get grayscale support. But as it is, having Unicode is much more important and it's very cool that it can finally be used, so thanks a lot again for this pull request.

I'm just a tad against breaking running systems though - my old installation doesn't have vcsu at least - and would prefer instead to have the code use vcsu if possible and fallback to vcsa if it's not found. Things would of course be a bit simpler without such, but there may be valid reasons for someone not being able to easily update the kernel and still wanting to run a later version of PaperTTY, for example to just get the speed optimizations for some very custom build. Fortunately this part of the code is pretty short and easy to adjust to support both (while perhaps keeping in mind the possible future need of richer vcsa handling) so I think the trade-off between complexity vs. compatibility is reasonable.

What do you think? Also, how extensively have you tested it yourself, ie. does it seem that everything Just Works (like running irssi inside tmux and other more complex cases)? I wish I'd have some time eventually to pull out the rpi from a closet and try out all the new stuff... :)

chi-lambda commented 4 years ago

I think a fallback should be fairly easy to do, I'd just need to have vcsudev return both device and character length (1 or 4) and re-add the encoding parameter, which will be ignored if vcsu is present. Incidentally, the default value of "utf-8" seems to be never correct; vcs appears to be using just the lower byte of an UTF-16 (or UCS-2?) value. I think latin-1 (ISO-8859-1) would be a sensible default, as that's identical to the first 256 code points.

chi-lambda commented 4 years ago

I've added some code to fall back to vcs if vcsu isn't available. I don't have a sufficiently old system available, but at least it doesn't appear to have broken the Unicode mode, so that's nice. I've tested with top, htop, wordgrinder and Midnight commander, which all look fine, though it's a bit hard to tell what's selected. 😉

joukos commented 4 years ago

Hmm. Which settings are you using? I got around to actually getting my old ZeroW out to try this, dist-upgrading it to get the vcsu (I didn't upgrade any dependencies - wonder if I should try that too for Pillow), but I get the Unicode replacement characters where spaces should be. I tried first with tmux but just running htop via openvt has the same problem (tabs however seem to work since there's some whitespace too between the columns).

I'm running sudo /home/pi/.virtualenvs/papertty/bin/python3 ./papertty.py --driver epd2in13 terminal --autofit --font "/usr/share/fonts/truetype/freefont/FreeMono.ttf" (because the default tom-thumb.pil actually crashes with this) and the contents of buff in the main loop is similar to:

'(papertty)�pi@rasputin:~ $������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n��������������������������������������������������\n[0] 0:bash*             "rasputin" 23:40 20-Jan-20\n'

We should make sure the defaults work somewhat, so probably some more care with font selection too is needed when incorporating this Unicode support. Unless I set some other font it crashes with stuff like UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 10: ordinal not in range(256) (error handling in general should be made a lot better anyway, right now there's next to nil exception catching going on). Seems it's PIL that throws it:

(papertty) pi@rasputin:~/new-version/PaperTTY $ sudo /home/pi/.virtualenvs/papertty/bin/python3 ./papertty.py --driver epd2in13 terminal --autofit
Automatic resize of TTY to 20 rows, 62 columns
Started displaying /dev/vcsa1, minimum update interval 0.1 s, exit with Ctrl-C
Traceback (most recent call last):
  File "./papertty.py", line 523, in <module>
    cli()
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "./papertty.py", line 508, in terminal
    **textargs)
  File "./papertty.py", line 266, in showtext
    draw.text((0, 0), text, font=self.font, fill=fill, spacing=spacing)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/PIL/ImageDraw.py", line 212, in text
    *args, **kwargs)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/PIL/ImageDraw.py", line 236, in multiline_text
    line_width, line_height = self.textsize(line, font)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/PIL/ImageDraw.py", line 263, in textsize
    return font.getsize(text, direction, features)
  File "/home/pi/.virtualenvs/papertty/lib/python3.5/site-packages/PIL/ImageFont.py", line 112, in getsize
    return self.font.getsize(text)
UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 10: ordinal not in range(256)

The Pillow in the virtualenv is version 5.2.0 (and the Pi is running stretch, 9.11 with kernel 4.19.66+). I'll look at this tomorrow some more if I have time, but apparently you don't have a similar problem?

chi-lambda commented 4 years ago

Those replacement characters are certainly strange. Can you attach the contents of your /dev/vcsu1?

Pil fonts are not unicode-ready. Even if creating them from a unicode font with pilfont.py, they only contain the latin-1 characters. Should we fall back to vcs if no truetype font was loaded?

joukos commented 4 years ago

I'll try to take another look in the evening. vcsa is needed anyway for the terminal size attributes so maybe the most straightforward way is to just skip the character attribute bytes as before if we want the 1-byte content.

But yeah, there could be a bit of smartness there to not use Unicode if it's obvious that the chosen font can't handle it (plus a warning message to inform the user). Not sure if there's some smarter way to handle this without making it unnecessarily complex.

chi-lambda commented 4 years ago

The current way is to use vcsa and skip the attributes to get the content. 🙂 But we can't do that for vcsu, as the Linux people haven't gotten around to implementing vcsua yet. The performance impact of opening vcs even though vcsa is already open is negligible, IMHO. Using vcsa would make the code more complex for little gain.

Checking whether we have a TrueType font is pretty easily done, as seen here. Then fall back to vcs.

joukos commented 4 years ago

I'm currently looking at this (and also implemented the TrueType check, although by adding a force_vcs=False to vcsudev argument list and doing the isinstance check in the main code that sets it accordingly).

I was a bit puzzled at first since now the unicode seemed to first work with a TrueType font, but then tried to start tmux and realized that that's what seems to cause the problem. For some reason (not entirely sure yet whether this is due to tmux or the vcsu code, but assuming tmux) the spaces that fill the "empty" area on the screen are encoded as 4 * 0x20 bytes, but if they're elsewhere, they seem to be 0x20 0x00 0x00 0x00, here's a hexdump of a prompt from /dev/vcsu1 where the top row shows me starting irssi (and returning from it to do something else):

00000000  70 00 00 00 69 00 00 00  40 00 00 00 72 00 00 00  |p...i...@...r...|
00000010  61 00 00 00 73 00 00 00  70 00 00 00 75 00 00 00  |a...s...p...u...|
00000020  74 00 00 00 69 00 00 00  6e 00 00 00 3a 00 00 00  |t...i...n...:...|
00000030  7e 00 00 00 2f 00 00 00  6e 00 00 00 65 00 00 00  |~.../...n...e...|
00000040  77 00 00 00 2d 00 00 00  76 00 00 00 65 00 00 00  |w...-...v...e...|
00000050  72 00 00 00 73 00 00 00  69 00 00 00 6f 00 00 00  |r...s...i...o...|
00000060  6e 00 00 00 2f 00 00 00  50 00 00 00 54 00 00 00  |n.../...P...T...|
00000070  59 00 00 00 20 00 00 00  24 00 00 00 20 00 00 00  |Y... ...$... ...|
00000080  69 00 00 00 72 00 00 00  73 00 00 00 73 00 00 00  |i...r...s...s...|
00000090  69 00 00 00 20 20 20 20  20 20 20 20 20 20 20 20  |i...            |
000000a0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
000000d0  20 20 20 20 20 20 20 20  70 00 00 00 69 00 00 00  |        p...i...|
000000e0  40 00 00 00 72 00 00 00  61 00 00 00 73 00 00 00  |@...r...a...s...|
000000f0  70 00 00 00 75 00 00 00  74 00 00 00 69 00 00 00  |p...u...t...i...|
00000100  6e 00 00 00 3a 00 00 00  7e 00 00 00 2f 00 00 00  |n...:...~.../...|
00000110  6e 00 00 00 65 00 00 00  77 00 00 00 2d 00 00 00  |n...e...w...-...|
00000120  76 00 00 00 65 00 00 00  72 00 00 00 73 00 00 00  |v...e...r...s...|
00000130  69 00 00 00 6f 00 00 00  6e 00 00 00 2f 00 00 00  |i...o...n.../...|
00000140  50 00 00 00 61 00 00 00  70 00 00 00 65 00 00 00  |P...a...p...e...|
00000150  72 00 00 00 54 00 00 00  54 00 00 00 59 00 00 00  |r...T...T...Y...|
00000160  20 00 00 00 24 00 00 00  20 00 00 00 73 00 00 00  | ...$... ...s...|
...

So that causes the funny replacement characters it seems. I fixed it by simply adding:

...
buff = vcsu.read()
if character_width == 4:
    buff = buff.replace(b'\x20\x20\x20\x20', b'\x20\x00\x00\x00')
...

And with that the problem appears to be fixed, however I'm still not 100% sure what to blame here, but it's as if tmux is writing incorrect bytes there for whatever reason. Maybe need to do some more testing for this.

On a positive note: tmux + irssi seems to work fine and handles accents etc., so it's now much more usable for IRC :wink:

joukos commented 4 years ago

One minor thing that came to mind: I suppose if we do want the character attributes, they could simply be read from vcsa instead of the unimplemented vcsua.

chi-lambda commented 4 years ago

Yes, that was my plan. 🙂

joukos commented 4 years ago

This is starting to look pretty good, I think. Did you have something more to add at this time?

I think some help texts could be added - that is, making the program clearly state if Unicode won't work because of missing vcsu or an invalid font - but that can be done separately too after this is merged.

chi-lambda commented 4 years ago

Great idea! I did just that.

joukos commented 4 years ago

I think I'll go ahead and merge this now.

It's a killer feature, thanks for the great work!

joukos / PaperTTY

Unicode TTY with /dev/vcsu #45