Closed stevecheckoway closed 6 years ago
@craigbarnes This has some fixes to quirk mode detection that I probably should have split out. The public/system identifiers should always be compared case-insensitively to the strings in the list. It's just that some of them need only be a prefix (those that aren't exact
).
@stevecheckoway Thanks for the heads up. I still need to merge a few of your other patches before this one I think.
It's not enough to parse a fragment based on a known tag and namespace. The the five pieces of information required are
encoding
attribute (when the element is an MathMLannotation-xml
element);The
encoding
attribute of anannotation-xml
context element determines if the content should be parsed as HTML or as foreign elements. See https://html.spec.whatwg.org/multipage/parsing.html#html-integration-pointBroken DOCTYPE declartions can put the document in quirks mode in addition to specific public and system identifiers. libxml2 has no way to record the
force-quirks flag
https://html.spec.whatwg.org/multipage/parsing.html#force-quirks-flag Fortunately, the quirks mode plays very little role in parsing.Finally, if the fragment context is a form element (or has one as an ancestor), then
<form>
and</form>
tags (among other things) are parse errors and the tags are ignored.