Closed ghost closed 1 year ago
I doubt this is rc's fault. It's almost entirely ignorant of character encodings, but since it largely slings around uninterpreted bytestrings it gets away with it. (I'm a bit surprised that ?
globs seem to work correctly in the presence of multibyte characters as that's one place where I would expect to need minimal UTF-8 support.)
First thing to check is that your locale settings are correct. What is $LANG
?
yeah, seems like my locale.conf wasn't been read by rc so I had to add LANG to .rcrc manually
Another problem:
Ah yes, thank you!
In this case, the command name is being deliberately scrambled by protect()
in which.c
. It wants to avoid non-printing characters, but uses the ASCII-only isprint()
.
I reckon I can fix that to handle UTF-8 fairly easily (without need to drag in libicu, for instance).
I then worry that we're being UTF-8-centric and what about the -16 and -32 encodings? I simply don't have enough experience to tackle those sensibly. If anyone does, do send a Pull Request!
I'm not sure how this works, but if possible I would like to avoid having anything higher than UTF-8, especially since you don't have enough experience. Most of the time simpler solution is more robust
Just saw this thread.. wonder why bother with a protect() at all? This is the simplest solution, and it's in line with rc punting on all UTF-8 issues (for now).
I think I wrote protect()
when I was much younger. If so, it was in response to some hostile environment or other (might well have been a Windows 3.1 terminal emulation + telnet
) and It Seemed Like A Good Idea At The Time.
Should we get rid of protect()
, @rakitzis?
It's not useful as it stands. Please remove it.
Looking into this, I saw that env -i rc
behaves the same as env -i sh
when build with EDIT=null
on my system. I only got the behaviour from the original comment when building with EDIT=readline
.
It might be worth it to look into other interactive programs that use readline (e.g. python) to see how they behave and how they get it right.
Python gets it right. This is what they're doing: https://peps.python.org/pep-0538/
OK does this boil down to something simple that can be done for rc?
On Linux it basically boils down to overwriting LC_CTYPE
to C.UTF-8
if it is C
or POSIX
at startup. Unfortunately on other systems the value needs to be slightly different.
I'll vote for doing nothing, because we can argue that an LC_CTYPE
of C
or POSIX
asks for this behaviour.
I'm closing this, because rc is encoding-agnostic and works correctly with a properly configured locale.
seems like this rc is really bad at everything other than ascii characters in interactive mode, it enters some weird state when i start typing on another language. is it even fixable? rc from plan9 works correctly in tty, in other terminals it thinks that non-standard characters are double-size and so it clears them uncorrectly with backspace