emacs-async shoots itself in the foot by using utf-8-auto

jwiegley / emacs-async

Simple library for asynchronous processing in Emacs

GNU General Public License v3.0

837 stars 68 forks source link

emacs-async shoots itself in the foot by using utf-8-auto #165

Open Eli-Zaretskii opened 1 year ago

Eli-Zaretskii commented 1 year ago

I'm told that async.el uses the utf-8-auto coding-system to encode stuff, assuming that the "auto" part means this coding-system handles the end-of-line (EOL) format automagically.

This is a mistake. Please read the doc string of utf-8-auto, and you will see that tha "auto" part is about the BOM, not about the EOL format. Moreover, on encoding utf-8-auto always produces a BOM, something that many Lisp (and non-Lisp) programs don't expect at all. (Due to a bug, utf-8-auto was not producing a BOM on encoding until now, but Emacs 29 fixes that bug.)

So my suggestion is to replace utf-8-auto with utf-8. The latter actually does decode the EOL as you'd expect, and is what you usually want.

thierryvolpiatto commented 1 year ago

Eli-Zaretskii @.***> writes:

So my suggestion is to replace utf-8-auto with utf-8. The latter actually does decode the EOL as you'd expect, and is what you usually want.

There is a comment in async.el from Stefan suggesting to use utf-8-unix:

  ;; FIXME: Why use `utf-8-auto' instead of `utf-8-unix'?  This is
  ;; a communication channel over which we have complete control,
  ;; so we get to choose exactly which encoding and EOL we use, isn't it?

So what should be used here, utf-8 or utf-8-unix? John?

Thanks Eli to look into this.

-- Thierry

Eli-Zaretskii commented 1 year ago

If this is used to communicate between two instances of async.el, then I recommend utf-8-emacs-unix. That is the encoding used by Emacs internally, and it can represent any character that Emacs is capable of processing.

thierryvolpiatto commented 1 year ago

Ok, thanks, so I will use utf-8-emacs-unix as you recommend. @jwiegley let me know what you think about this.

jwiegley commented 1 year ago

I agree with @Eli-Zaretskii.

meedstrom commented 6 months ago

@Eli-Zaretskii Just hijacking this thread, but am I correct in understanding that utf-8-auto now detects EOL as well as BOM?

Going by describe-coding-system on Emacs 29.1:

U -- utf-8-auto

UTF-8 (auto-detect signature (BOM))
Type: utf-8 (UTF-8: Emacs internal multibyte form)
EOL type: Automatic selection from:
    [utf-8-auto-unix utf-8-auto-dos utf-8-auto-mac]

I hear you that it will insert BOM on write, so it would be a pretty bad coding system for write. But if you only use it to read e.g. files from both Linux and Mac workstations (some of which somehow have a BOM), but not write anything, it sounds okay.

Eli-Zaretskii commented 6 months ago

am I correct in understanding that utf-8-auto now detects EOL as well as BOM?

Yes. It always detected EOL, btw. The fix in Emacs 29 was to correct the handling of BOM.