jwiegley / emacs-async

Simple library for asynchronous processing in Emacs
GNU General Public License v3.0
828 stars 68 forks source link

Encoding system is hardcoded to Unix #93

Closed astahlman closed 6 years ago

astahlman commented 6 years ago

Hi there,

I'm the author of ob-async, which depends on emacs-async (awesome package, by the way - thanks!)

We have an open issue (https://github.com/astahlman/ob-async/issues/18) in which a Windows user reports seeing two newlines in the output of org-babel results blocks where they expected to see only one. I believe I've traced it back to the way emacs-async handles encoding of EOL markers.

From what I can tell, emacs-async hardcodes the encoding system to utf-8-unix, which assumes the EOL marker is the Unix newline \n and decodes the carriage-return character \r as ^M.

Is there any particular reason the coding system is hardcoded to the Unix variant? If not, I think all references to utf-8-unix could be seamlessly replaced with utf-8-auto [1] [2], which appears to gracefully handle carriage-returns:

#+BEGIN_SRC emacs-lisp :results output
  (defun test-decoding (coding-system)
    (let ((sexp (decode-coding-string (base64-decode-string
                                       (base64-encode-string "line 1\r\nline 2")) coding-system))
          (coding-system-for-write coding-system))
      (print (format "With coding system `%s`: {{{%s}}}" coding-system (pp-to-string sexp)))))

  (test-decoding 'utf-8-unix) ;; treats \r as a separate newline
  (test-decoding 'utf-8-dos)  ;; treats \r\n as a single newline
  (test-decoding 'utf-8-auto) ;; treats \r\n as a single newline
#+END_SRC

#+RESULTS:
: 
: "With coding system `utf-8-unix`: {{{\"line 1
\\nline 2\"}}}"
: 
: "With coding system `utf-8-dos`: {{{\"line 1\\nline 2\"}}}"
: 
: "With coding system `utf-8-auto`: {{{\"line 1\\nline 2\"}}}"

[1] Output of M-x describe-coding-system utf-8-auto

U -- utf-8-auto

UTF-8 (auto-detect signature (BOM))
Type: utf-8 (UTF-8: Emacs internal multibyte form)
EOL type: Automatic selection from:
    [utf-8-auto-unix utf-8-auto-dos utf-8-auto-mac]
This coding system encodes the following charsets:
  unicode

[2] https://stackoverflow.com/questions/17862846/whats-the-difference-among-various-types-of-utf-8-in-emacs

jwiegley commented 6 years ago

No, I don't think it needs to be hard-coded.

astahlman commented 6 years ago

Closing now that #94 is merged - thanks for the quick response.