ocaml / omd

extensible Markdown library and tool in "pure OCaml"
ISC License
156 stars 45 forks source link

Html_block parsing too eager #259

Closed reynir closed 2 years ago

reynir commented 2 years ago

It seems that parsing of Html_block in some cases is too eager. In the below example the sub-list is consumed by the preceding <div/>:

# Omd.of_string "* <div>My ref</div>\n  * [See my ref](#myref) for more information"
- : Omd.doc =
[Omd.List ([], Omd.Bullet '*', Omd.Tight,
  [[Omd.Html_block ([],
     "<div>My ref</div>\n* [See my ref](#myref) for more information\n")]])]

Also worth noting and perhaps creating a separate issue is that inline html seems to be handled differently:

# Omd.of_string "# <div>My ref</div>";;
- : Omd.doc =
[Omd.Heading ([], 1,
  Omd.Concat ([],
   [Omd.Html ([], "<div>"); Omd.Text ([], "My ref"); Omd.Html ([], "</div>")]))]
shonfeder commented 2 years ago

Oop. That’s not good! Thank you for the report. One issue is fine here :)

I’ve been working on to improve the block parser so it’s easier to reason about and extend/fix, but perhaps this can be remedied by a quicker fix. I’ll take a look at some point this week.

shonfeder commented 2 years ago

I've done a bit of looking. iiuc, this behavior is consistent with the commonmark spec, see in particular Example 178. A more perspicuous view into this apparently expected behavior is found by putting your example into the "try it" widget, and then viewing the AST: see this link

I confess, this reinforces my impression of markdown as a kind of zany and irrational markup language! But, such is our world!

As to the inline example, we seem to also be consistent with the reference implementation as per this example (again, see the AST).

So I think we are faced with the choice of deviating for the spec of keeping this surprising behavior. I propose we at least keep this issue open as a sign that the question is unsettled. For my part, once the parsers are a bit more tractable, I'd not be opposed to deviating from the spec, if it meant more sensible behavior, provided we could ensure it was also consistent.

reynir commented 2 years ago

Thank you for looking into this. I agree re markdown as a language. I find it full of surprises. I'm inclined to closing this issue. I think deviating from a language already full of surprise is just going to make things more confusing.

shonfeder commented 2 years ago

I think deviating from a language already full of surprise is just going to make things more confusing

That's a very good point.

Maybe once this library is in tip top shape, "we" can turn our sights towards an improved lightweight markup language ;)

Thanks again for the report!