kiranshila / cybermonday

Markdown as Clojure Data
Eclipse Public License 1.0
106 stars 10 forks source link

Illegal state exception: parser IR on some HTML #82

Open jakeisnt opened 1 year ago

jakeisnt commented 1 year ago

Hey! Thanks for this package. It's made my life easy.

I'm getting an IllegalStateException from within the library.

Message

Execution error (IllegalStateException) at cybermonday.ir/fold-inline-html$fn (ir.cljc:55).
Can't pop empty vector

I am unsure how to proceed with troubleshooting or what might be causing this inconsistent state.

Code's here: https://github.com/kiranshila/cybermonday/blob/8b94aef1d8b3eb9b5ca465370b80521beb8c35eb/src/cybermonday/ir.cljc#L55

Reproducing

  1. Read https://github.com/jakeisnt/wiki/blob/2319342e442da8580189c639b9dd64382ef49d61/pages/smess.md in to a string.
  2. Invoke (parse-md) on that file string.

The offending text in the file is <span class="spurious-link"target="matrix.org">*matrix.org*</span>; this may be a problem with other HTML in markdown contexts as well.

Here's my entire context if you're interested: https://github.com/jakeisnt/site/blob/9aa28a968f0f1cd5e8ca8567852cb7d0479c96ca/src/main.clj

kiranshila commented 1 year ago

Hey! Yeah, all the code around the inline HTML stuff is very touchy. I quick little test points to the newline in the middle of the tag messing things up.

  [:markdown/soft-line-break {}]
  "could be hacked together with systems like "
  [:markdown/html {} "<span class=\"spurious-link\"\n  target=\"matrix.org\">"]
  [:em {} "matrix.org"]
  [:markdown/html {} "</span>"]

Then trying to (parse-tag "<span class=\"spurious-link\"\n target=\"matrix.org\">") yeilds

Execution error (NullPointerException) at java.util.regex.Matcher/getTextLength (Matcher.java:1769).
Cannot invoke "java.lang.CharSequence.length()" because "this.text" is null

This does not happen if I remove the newline

cybermonday.ir> (parse-tag "<span class=\"spurious-link\"  target=\"matrix.org\">")
;; => [:span {:class "spurious-link", :target "matrix.org"}]

So, the functions html-attr-to-map and parse-tag need attention.

jakeisnt commented 1 year ago

I appreciate the pointer! Ended up resolving the issue (as the name suggests, the spurious links weren't supposed to be there) but I'll take a crack at this this weekend : )