minad / marginalia

:scroll: marginalia.el - Marginalia in the minibuffer
GNU General Public License v3.0
782 stars 27 forks source link

Add option to disable escaping of multibyte characters. #118

Closed intramurz closed 2 years ago

intramurz commented 2 years ago

When annotating variables, values containing multibyte characters are escaped using backslash sequences. It's portable but ugly and not very useful. It seems to me that it's better to display these values as they are, in systems that support multibyte encodings (that is, all modern ones).

Thank you.

minad commented 2 years ago

Unfortunately disabling print-escape-multibyte leads to display issues with bidi characters. As an example look at the variable bidi-directional-control-chars which messes up the display when print-escape-multibyte is nil. Maybe other problematic characters affect the display too. However I wonder why print-escape-control-characters=t does not result in escaped bidi characters. This is either a display bug or bidi characters are not classified as control characters. Probably one should ask on the emacs bug tracker about this issue. Maybe we find an alternative approach where print-escape-multibyte is always set to nil and we apply our own additional escaping on top, which only affects the characters which are really problematic.

intramurz commented 2 years ago

I think it will not be an easy task. Maybe users who don't use bidirectional text can turn escaping off for now as a workaround? Certainly, it is necessary to warn about possible side effects.

minad commented 2 years ago

No, I am not in favor of adding such workarounds. Disabling the escaping is not even a workaround, it is ignoring the issue. I prefer a proper fix.

I propose the following approach:

  1. Report the bug to emacs devel via report-emacs-bug. Bidi control characters should be escaped, and maybe there are even more characters which are missed by the current escaping logic. Ideally this issue should be fixed upstream since I've seen display issues in other packages too despite print-escape-control-characters=t (for example in the helpful package).
  2. Add our own escaping here which selectively escapes the problematic control characters which are not escaped even if print-escape-control-characters is non nil. Hopefully the workaround can be removed in the long term. If upstream disagrees with the bug report and argues that the escaping is correct as is, then we have to keep our own escaping forever. I hope not!
intramurz commented 2 years ago

You are right, I have not considered all aspects of the problem.

minad commented 2 years ago

I also pinged @iyefrat from Doom Emacs who is interested in bidi and internationalisation support in Emacs, in particular multi-byte bidi languages like Hebrew and Arabic. Hopefully we can figure out a better fix which solves the underlying issues.

minad commented 2 years ago

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=52459

intramurz commented 2 years ago

I followed the discussion above with keen interest. Thank you Daniel for citing my opinion as an argument. Eli is correct that Emacs does not have a consistent approach to print functionality (but this is up to its developers). However, print/ prin1 functions are different from the rest. A correct Lisp implementation assumes that any reasonably simple object (such are strings) can be printed and then read back into the system (and even evaluated). That is, these functions are used for reversible and easy-human-readable serialization, and their output can be used for more than just display. Obviously, what Eli is proposing does not quite meet this criterion.

In the meantime, I read the source of Emacs (not the prettiest code that I have seen in my life :). What you propose is quite feasible, I can outline how this can be done and for this you do not need to rewrite the entire Emacs (again, this is up to the developers). I think it would be possible to extend the internal c_iscntrl function to include Unicode control and formatting characters (for now, it just checks the POSIX control characters listed with the dreaded macro). I don’t know how reliable the standard C functions iswprint and iswcntrl are in this matter, otherwise I would prefer to use them and not reinvent the wheel. And it would be nice to correct this place in the internal print_object so that these characters would be hex escaped, not octal. For example, like here.

Alternatively, it is possible to make a separate sibling check function, separate if branch and a separate variable that controls all of this. But this will complicate things even more.

Thank you for your efforts. Howbeit, marginalia (and consult too :) are very useful even as they are.

minad commented 2 years ago

Thanks! If you are interested feel free to chime into this discussion there. I will not push this further.

intramurz commented 2 years ago

No thanks. I'm afraid I don't have enough time for this.