Behavior about embark-decode-url

uqix commented 2 months ago

embark-encode-url on http://a.b.c/你好, got result1: http://a.b.c/%E4%BD%A0%E5%A5%BD;

embark-decode-url on result1, got result2: http://a.b.c/\344\275\240\345\245\275.

As mentioned in url-unhex-string fn help:

The resulting string in general requires decoding using an appropriate coding-system.

It'd be better to decode-coding-string result2:

(defun embark-decode-url (start end)
  "Decode the URI-encoded region between START and END in current buffer."
  (interactive "r")
  (let ((decoded (url-unhex-string (buffer-substring-no-properties start end))))
    (delete-region start end)
    (insert (decode-coding-string decoded 'utf-8))))

oantolin commented 1 month ago

Sorry for the long delay, I was "away from GitHub" for a period of time. 😬

Is it safe to always decode utf-8? Does url-unhex-string always use that encoding? Or should the encoding to use be looked up somehow?

uqix commented 1 month ago

utf-8 is a sensible default. I'm not sure someone else needs to choose another encoding.

oantolin commented 1 month ago

It looks like there is no way to know what encoding the URL was originally using, so probably UTF-8 is the only reasonable guess. I'll make it use UTF-8 and we'll see if anyone has any complaints. :)

oantolin / embark

Behavior about embark-decode-url #725