Open derek-zhou opened 2 years ago
Yeah, this is a bug :/ It won't be fixed easily because of https://github.com/philss/floki/issues/37 But at least we are half way there https://github.com/philss/floki/projects/2
Do you mean that the mochiweb is too fragile to fix, and a brand new parser is on the way?
@derek-zhou It's not that is too fragile, but I think the HTML parsing state machine is too damn complicated to fix when the parser never followed the specs :sweat_smile:
I plan to finish the built-in parser one day. But in the meanwhile, I suggest you to give it a try to the html5ever
parser https://github.com/philss/floki#using-html5ever-as-the-html-parser, now that comes with precompiled NIFs (you don't need Rust to use it anymore).
I am not afraid of a little of rust tool chain. However, I need to do some ad-hoc XML parsing in the same application and I am afraid if the html5ever parser could be too strict on things.
@derek-zhou I see. You can use both if you need. Just pass the parser as an option to parse_document
.
Description
According to HTML5 spec, closing
</p>
tag is optional. ie:is equivalent to:
However, Floki with the builtin parser does not handle this correctly.
To Reproduce
It looks like Floki fills in the missing
</p>
at the end of the document.Expected behavior
<p>
tag shall not contain another<p>