yogthos / markdown-clj

Markdown parser in Clojure
Eclipse Public License 1.0
540 stars 120 forks source link

First `<hr/>` tag mangled using `escape-html` #112

Closed whitecoop closed 7 years ago

whitecoop commented 7 years ago

Using the escape-html function from the README, for some reason the first horizontal rule after a list gets mangled in the output:

(defn escape-html [text state]
  (let [sanitized-text (clojure.string/escape text
                         {\& "&amp;"
                          \< "&lt;"
                          \> "&gt;"
                          \" "&quot;"
                          \' "&#39;"})]
    [sanitized-text state]))

(md/md->html "* item\n\n***"
  :replacement-transformers (cons escape-html mdtrans/transformer-vector))
;; => "<ul><li>item</li></ul>&lt;hr/&gt;"

(md->html "* item\n\n***\n\n***"
  :replacement-transformers (cons escape-html mdtrans/transformer-vector))
;; => "<ul><li>item</li></ul>&lt;hr/&gt;<hr/>"

vs

(md/md->html "* item\n\n***")
;; => "<ul><li>item</li></ul><hr/>"

At first I thought that it may have to do with consing escape-html to the front of the transformer-vector, but I found that the second one isn't mangled.

yogthos commented 7 years ago

The parser is stateful, and the order of transformers is state dependent. So, if escape-html is called before the list transformer runs, it could end up being in a bad state.

whitecoop commented 7 years ago

Is there a way to guarantee that escape-html runs before anything else?

yogthos commented 7 years ago

You could just preprocess the text before passing it to md->html:

(defn escape-html [text state]
  (clojure.string/escape text
    {\& "&amp;"
     \< "&lt;"
     \> "&gt;"
     \" "&quot;"
     \' "&#39;"}))

(-> "* item\n\n***" escape-html md->html)

I should probably update the example in the readme to something else though, since this doesn't work as a transformer.

whitecoop commented 7 years ago

👍 ok, thanks