Closed christianhujer closed 4 years ago
Ha! This one is interesting. There is a problem with tag parity there as i
is not a void element. But definitely something to review on the ul|ol/li
rule. Thanks @christianhujer
Of course <i/>
is valid element. And that it's not a void
element is irrelevant, as that distinction is only important for the HTML syntax of HTML documents, not the XML syntax for HTML documents. As the file suffix and the use of the xmlns
attribute indicate, this file is an HTML document in XML syntax. <i/>
is empty italics in EmptyElemTag syntax. And both, Nu Valiator and xmllint
agree on the well-formedness and validity of above sample document. The use case is typically in combination with something like Font Awesome, like <i class="fa fa-facebook"/>
.
But even if we assume HTML syntax, the error is wrong because still the <i>
is a child of <li>
and not <ul>
.
🤔 Thanks for sharing that. We initially were not considering the XML syntax for HTML documents, and to be honest I may not have much experience with that as we have been focusing on the HTML side of it. Maybe we should be explicit about this and focus on that small portion of the specification, ensure it works well to then expand to the larger spec that includes XML for HTML.
May I get your advise on documents/specs that we could read to understand deeply the spec? Appreciate your help and guidance with all this.
The XML Syntax is actually much simpler than the HTML Syntax. When it comes to the syntax itself, you can take a shortcut: Simply use a non-validating XML parser and report whatever errors it throws. In XML-language, parsing an XML document implies a check for well-formedness, which is XML-speak for syntax-check.
Then use the same semantics as those for HTML for semantical validation.
That means the checks would need to be grouped into syntactical checks and semantical checks. This is probably a good idea anyway.
To have nice error messages, you would need an XML parser that provides meta-information for nodes such as line and column. I don't know if such an XML parser exists, but it should be easy to create.
The relevant specifications are these, sorted by importance for linting HTML (most important first):
Don't let this list overwhelm you. It's easy. And half of the documents are relevant even if you ignore XML (for example, everything related to RDFa and SVG remains relevant for linting HTML even if the XML Syntax variant of HTML is ignored).
The way how XML works makes stuff easy: 95% of linting can be categorized as:
I haven't looked into its source code, but I bet that the Nu Validator might give a lot of inspiration about how to do this on various levels from architecture to implementation.
Of course
<i/>
is valid element. And that it's not avoid
element is irrelevant, as that distinction is only important for the HTML syntax of HTML documents, not the XML syntax for HTML documents. As the file suffix and the use of thexmlns
attribute indicate, this file is an HTML document in XML syntax.<i/>
is empty italics in EmptyElemTag syntax. And both, Nu Valiator andxmllint
agree on the well-formedness and validity of above sample document. The use case is typically in combination with something like Font Awesome, like<i class="fa fa-facebook"/>
.But even if we assume HTML syntax, the error is wrong because still the
<i>
is a child of<li>
and not<ul>
.
@christianhujer Thanks for bringing this case to discussion. So this is a case of a misnested tag. Misnested tag is considered an error handling and strange case when parsing a HTML file. I have to clarify that Milo in this moment does not support XML parsing or processing. XHTML files are processed as HTML files and follow all process for parsing and tokenization expressed here
What is interesting about Nu validator is that "The parser is designed to work as a drop-in replacement for the XML parser in applications that already support XHTML 1.x content with an XML parser and use SAX, DOM or XOM to interface with the parser". In some ways this can inspire Milo to a broader scope in a future.
It's not a misnested tag from the perspective of the file. The file is correct. It's only a misnested tag from the perspective of Milo because Milo misinterprets the file as HTML in HTML Syntax instead of HTML in XML Syntax.
Still there is an issue from the perspective of the HTML Syntax, the error message is misleading. The error message should be that a close tag </i>
was expected, but a close tag </li>
was received.
I agree that this reviewers should not fire on this case. There is another reviewer in charge of unpaired tags which should point out the <i/>
issue.
The HTML in XML Syntax topic is one that will need more thought. Being Milo a side project and something we want to find useful initially I trend to think we will keep doing the assumption of HTML in HTML Syntax. One thing that we could do is being more explicit about this topic in the docs (README for now).
@christianhujer I really appreciate that you gave us a light with regards this topic of HTML in XML Syntax.
If I had time, I would love to help you with the implementation of the XML Parser. Alas, I'm quite occupied these days. But Milo will certainly be a long-term project, I can see that from what's there and from the changes. So, I'm sure one day it will get an XML Parser.
@larrymjordan I'm covering this one while removing the go query dependency.
Given the following well-formed and valid XHTML 5.2 file:
Milo reports a false positive: