vermiculus / sx.el

Stack Exchange for Emacs
http://stackapps.com/q/3950
709 stars 40 forks source link

  is displayed verbatim, rather than as a space #287

Closed phil-s closed 9 years ago

phil-s commented 9 years ago

e.g.: http://stackoverflow.com/q/4987760

I'm not sure whether there are other character entities which need to be looked at?

I tried grepping the Emacs lisp source for "nbsp" to see if there's an obvious centralised function for handling such things, and didn't spot an obvious canonical function; but gnus provides html2text.el and html2text-substitute which might be a good fit?

Offhand I'm not sure which character entities StackExchange supports (or not). I do note http://meta.stackexchange.com/questions/1777/what-html-tags-are-allowed-on-stack-exchange-sites , but that doesn't mention character entities specifically.

vermiculus commented 9 years ago

Can you give a question where this problem occurs? Wow, I can't read today. According to the source, it shouldn't be displaying at all (though it should still be a unicode non-breaking space).

vermiculus commented 9 years ago

I've confirmed that sx-encoding-decode-entities isn't being called anywhere from sx-question-mode--insert-markdown – @Malabarba, where would be the right place to stick that call?

I'm thinking

(defun sx-question-mode--insert-markdown (text)
  (let ((beg (point)))
    (insert
     (with-temp-buffer
       (insert text) ; <--- here
       ...))))

It's odd, though – while this works, I feel like sx-encoding-clean-content should be taking care of this.


A bit of debugging has revealed that the question body has contents like this:

C-x&lt;/kbd&gt;&amp;nbsp;&lt;kbd&gt;-&lt;/kbd&gt; (`shrink-window-if-larger-than-buffer`)

Specifically, &amp;nbsp; isn't being recognized. This could potentially be handled in sx-encoding-decode-entities – I just need to twiddle around with it :) Thanks for catching this!

I'm a little worried though that it will mess with HTML code blocks. Alas, it does: http://stackoverflow.com/q/1571648/1443496.

Malabarba commented 9 years ago

I had run into this before as well.
Our code is working as intended, the problem is that a few question/answers actually have things like &nbsp; in the markdown. So stackexchange sends us &amp;nbsp; in the xml, which we correctly translate to &nbsp; in the markdown.

The reason this works is that stackexchange processes these html elements when rendering the markdown. So I guess that's what we need to do. This should be just a matter of calling sx-encoding-decode-entities again somewhere inside sx-question-print.

vermiculus commented 9 years ago

IMO, this is an issue internal to markdown-mode, so I'd raise it with the maintainer of that package. This has been discussed in chat.