whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.16k stars 2.69k forks source link

Is "Contexts in which this element can be used:" and "Content model:" always supposed to be precise? #4556

Open domenic opened 5 years ago

domenic commented 5 years ago

@mtrootyy has been helpfully opening a bunch of issues regarding the fact that the non-normative "Contexts in which this element can be used:" and "Content model:" text that introduces each element is not as precise as it could be. E.g., it doesn't always state exclusions in full detail, instead requiring you to look in other locations in the spec to find those exclusions.

We should determine whether it's a goal for the spec to be precise in this way, or if the non-normative summaries can be just summaries. I think @zcorpan or @sideshowbarker are the the best ones to understand and investigate these issues, and welcome their perspectives.

I'll close the individual issues and link them to this thread, and @mtrootyy (or anyone else) can post comments here for other cases they find.

zcorpan commented 5 years ago

Content models are normative:

An HTML element must have contents that match the requirements described in the element's content model.

https://html.spec.whatwg.org/multipage/dom.html#content-models

Content model A normative description of what content must be included as children and descendants of the element.

https://html.spec.whatwg.org/multipage/dom.html#concept-element-content-model

The "Contexts in which this element can be used" is non-normative:

Contexts in which this element can be used A non-normative description of where the element can be used. This information is redundant with the content models of elements that allow this one as a child, and is provided only as a convenience.

https://html.spec.whatwg.org/multipage/dom.html#concept-element-contexts

Content models should be precise, so that they can be used as a basis for a conformance checker.

The "Contexts in which this element can be used" should be helpful and not confusing. Possibly being precise helps with that.

zcorpan commented 5 years ago

That said, these can refer to other requirements with "see prose" if it gets complex, as is done for e.g. ruby, time.

sideshowbarker commented 5 years ago

Short answer from my POV maintaining HTML-checker code:

I agree it should always be a goal to make both the "Content models" and "Contexts in which this element can be used" as precise as complete as possible. But that said, I think there are cases where I think trying to state all exclusions in full detail — instead requiring readers to look in other locations in the spec to find those exclusions — might actually conflict with the usability of the HTML checker.


To explain, here’s a longer answer:

Content models are normative:

Right — the content models are what I use when implementing reporting of errors in the HTML checker.

The "Contexts in which this element can be used" is non-normative:

Right — when implementing the HTML checker code, I don’t look at any of the "Contexts in which this element can be used" sections. They’re repeating information that is already captured as normative requirements in the "Content model" sections.

Content models should be precise, so that they can be used as a basis for a conformance checker.

Yes. The content models sections should be unambiguous.

That said, these can refer to other requirements with "see prose" if it gets complex, as is done for e.g. ruby, time.

Right — and for my purposes, that works find, because in those cases the "Content model" sections are still complete but just state some of the requirements by reference.

But as far as the needs for the HTML checker go, something worth noting is that the checker backend scrapes all the "Content model" and "Contexts in which this element can be used" sections and in certain cases, emits them in its error messages.

For example, consider the following:

https://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%253C%2521DOCTYPE%2520html%253E%250D%250A%253Chtml%2520lang%253D%2522%2522%253E%250D%250A%253Ctitle%253ETest%253C%252Ftitle%253E%250D%250A%253Cruby%253E%253C%252Fruby%253E#textarea

Error: Element ruby is missing a required instance of one or more of the following child elements: rp, rt, rtc.

From line 4, column 7; to line 4, column 13

le><ruby></ruby>

Content model for element ruby:
See prose.

Notice that the Content model for element ruby section there. It is a (scraped) copy of the relevant section of the spec. And notice that while it’s not particularly useful on its own in this case — because it says to just look at the prose of the spec — the ruby part is a hyperlink that users can follow to read the relevant section of the spec for the details.

But the reason the Content model for element ruby section in the spec is done that way is that the requirements in this case actually need a lot of words to define. And we were to take all those words and attempt to move them into the Content model for element ruby section itself, I think it would end up causing an excessive amount of text to be emitted in the HTML checker error message — more text than I think most users would want in the error message.

So I think it’s actually better usability for users if they’re directed (by way of the ruby hyperlink in the message) to the actual spect text.


For the sake of completeness in showing how things work for the HTML checker, another example:

https://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%253C%2521DOCTYPE%2520html%253E%250D%250A%253Chtml%2520lang%253D%2522%2522%253E%250D%250A%253Ctitle%253ETest%253C%252Ftitle%253E%250D%250A%253Cul%253E%250D%250A%253Cruby%253E%253C%252Fruby%253E%250D%250A%253C%252Ful%253E#textarea

Error: Element ruby not allowed as child of element ul in this context. (Suppressing further errors from this subtree.)

From line 5, column 1; to line 5, column 6

tle><ul><ruby></ruby

Contexts in which element ruby may be used:
Where phrasing content is expected.
Content model for element ul:
Zero or more li and script-supporting elements.

Notice that in this case, the Contexts in which element ruby may be used section is emitted.

So from that example you can see how the "Contexts in which this element can be used" sections, even though they’re not normative, can still be helpful for users of the HTML checker, at point of use.