Closed intramurz closed 2 years ago
Unfortunately disabling print-escape-multibyte
leads to display issues with bidi characters. As an example look at the variable bidi-directional-control-chars
which messes up the display when print-escape-multibyte
is nil. Maybe other problematic characters affect the display too. However I wonder why print-escape-control-characters=t
does not result in escaped bidi characters. This is either a display bug or bidi characters are not classified as control characters. Probably one should ask on the emacs bug tracker about this issue. Maybe we find an alternative approach where print-escape-multibyte
is always set to nil and we apply our own additional escaping on top, which only affects the characters which are really problematic.
I think it will not be an easy task. Maybe users who don't use bidirectional text can turn escaping off for now as a workaround? Certainly, it is necessary to warn about possible side effects.
No, I am not in favor of adding such workarounds. Disabling the escaping is not even a workaround, it is ignoring the issue. I prefer a proper fix.
I propose the following approach:
report-emacs-bug
. Bidi control characters should be escaped, and maybe there are even more characters which are missed by the current escaping logic. Ideally this issue should be fixed upstream since I've seen display issues in other packages too despite print-escape-control-characters=t
(for example in the helpful package).print-escape-control-characters
is non nil. Hopefully the workaround can be removed in the long term. If upstream disagrees with the bug report and argues that the escaping is correct as is, then we have to keep our own escaping forever. I hope not!You are right, I have not considered all aspects of the problem.
I also pinged @iyefrat from Doom Emacs who is interested in bidi and internationalisation support in Emacs, in particular multi-byte bidi languages like Hebrew and Arabic. Hopefully we can figure out a better fix which solves the underlying issues.
I followed the discussion above with keen interest. Thank you Daniel for citing my opinion as an argument. Eli is correct that Emacs does not have a consistent approach to print functionality (but this is up to its developers). However, print
/ prin1
functions are different from the rest. A correct Lisp implementation assumes that any reasonably simple object (such are strings) can be print
ed and then read
back into the system (and even eval
uated). That is, these functions are used for reversible and easy-human-readable serialization, and their output can be used for more than just display. Obviously, what Eli is proposing does not quite meet this criterion.
In the meantime, I read the source of Emacs (not the prettiest code that I have seen in my life :). What you propose is quite feasible, I can outline how this can be done and for this you do not need to rewrite the entire Emacs (again, this is up to the developers). I think it would be possible to extend the internal c_iscntrl
function to include Unicode control and formatting characters (for now, it just checks the POSIX control characters listed with the dreaded macro). I don’t know how reliable the standard C functions iswprint
and iswcntrl
are in this matter, otherwise I would prefer to use them and not reinvent the wheel. And it would be nice to correct this place in the internal print_object
so that these characters would be hex escaped, not octal. For example, like here.
Alternatively, it is possible to make a separate sibling check function, separate if
branch and a separate variable that controls all of this. But this will complicate things even more.
Thank you for your efforts. Howbeit, marginalia
(and consult
too :) are very useful even as they are.
Thanks! If you are interested feel free to chime into this discussion there. I will not push this further.
No thanks. I'm afraid I don't have enough time for this.
When annotating variables, values containing multibyte characters are escaped using backslash sequences. It's portable but ugly and not very useful. It seems to me that it's better to display these values as they are, in systems that support multibyte encodings (that is, all modern ones).
Thank you.