What does nested according to the specification mean in SC 4.1.1

mraccess77 commented 4 years ago

Refer to he following thread. Does SC "nested according to the specification" mean nested according to the syntax of opening and closing tags or in terms of the specification saying certain tags can't be within certain tags.

https://lists.w3.org/Archives/Public/w3c-wai-ig/2019OctDec/0113.html

JAWS-test commented 4 years ago

I have read the WCAG documents on this subject and could not find a clear answer to this question. I therefore propose that

to discuss this question, decide about it and document the result for WCAG 2.x in the Understanding
to abolish SC 4.1.1 with WCAG 3.x (see https://github.com/w3c/wcag/issues/770)

I suspect that the WCAG parsing only bookmarklet is often used for testing. This interprets SC 4.1.1 in a way that it is not only about correct nesting, but that according to the HTML specification also the child elements must be correct.

Unfortunately I can't find any information about whether the DOM or the source code should be checked:

Some of the SC 4.1.1 problems do not occur in the DOM because they are automatically fixed by the browser.
The source code is often not relevant these days, because it only contains JavaScript instructions for creating the DOM and hardly any HTML.

The question is also to what extent AT uses DOM or source code or whether they use the Accessibility APIs of the operating systems.

alastc commented 4 years ago

I've always interpreted that as meeting the spec of the language used. I.e. if you're writing HTML5, according to that spec. (I'd like to avoid tangets about multiple HMTL specs please!)

Therefore the nesting should be according to the rules of that spec, so I agree with the parsing bookmarket's approach.

DOM vs source is tricky, as you have to use the DOM to work out what the source code (including scripts) actually is, and then work out if that matches the spec. Another good reason to depricate 4.1.1, as if you are going by DOM then you should look at impact not spec.

My understanding is that AT genearlly uses the accessibility API of the system, but there are some odd cases which can use direct access (e.g. Dragon possibily?).

dd8 commented 4 years ago

If there are still ATs using direct access to the source, then they're very unlikely to apply JavaScript or CSS changes to the source. This may impact assumptions about JavaScript/CSS elsewhere in WCAG.

The reason I think direct access ATs are unlikely to apply JavaScript/CSS is that it's a lot of effort - at least an order of magnitude harder than pulling the information from the browser DOM or accessibility API. To apply JavaScript you need:

an HTML parser to find <script> elements,
networking to download external script src files
something to manage a fake DOM
a JavaScript parser
a JavaScript runtime engine
a garbage collector

Even if you pull in an existing implementation like Chrome's V8 JavaScript engine you still have a lot of integration work to do on the first 3 items above.

To apply CSS you need:

an HTML parser to find <style> and <link> elements,
networking to download external link rel=stylesheet files
something to manage a fake DOM
a CSS parser
a CSS engine that applies the cascade to the fake DOM

To apply JS/CSS you basically have to build most of a browser except the rendering portion.

detlevhfischer commented 4 years ago

Even after using the "Parsing only" or TPG's Validate Page bookmarklet, on the W3C nu validator results, there seem to be a number of things flagged as errors that probably have little or now impact on accessiblity, such as

Inappropriate attributes such as a name attribute on div or title attribute on svg
div used within strong or other inline elements

Then there are other cases where I am less confident that they are harmless, such as a div as child of ul. Apart from custom attributes that have been discussed here #1078 and seem to be OK, I'd be curious what folks see as exemptions that do not violate the letter of 4.1.1?

patrickhlauke commented 3 years ago

x-ref https://github.com/w3c/wcag/issues/770

cstrobbe commented 2 years ago

This issue was triggered by a mailing list contribution by me but I hadn't seen it until it was referenced in a BIK-BITV issue (in German) today. My insistence on nesting based on syntax instead of nesting based on content models is based on the concept of well-formedness that informed discussions about the formulation of the success criterion in the years 2005-2008. Below are a few pointers to those discussions.

WCAG WG meeting minutes, 23 June 2005: resolution to remove all SC under Guideline 4.1 and to replace them with an editorial note that they require discussion and comments. Quote from the discussion leading up to that resolution: "Acknowledgement that well-formedness doesn't apply to SGML and a proposal at http://lists.w3.org/Archives/Public/w3c-wai-gl/2005AprJun/0841.html" (I had come to the WG with a background in using and teaching XML.)
WCAG WG meeting minutes, 10 November 2005: after a discussion on SGML content models, the working group accepts the following wording for SC 4.1.1: "Delivery units can be parsed unambiguously." ("Delivery units" would eventually be replaced with "Web pages".) The following definition of parsing was accepted along with it: "Parsing transforms markup or other code into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input. Parsing unambiguously means that there is only one data structure that can result." Parsing into a correct tree requires correct syntax, not correct content models. XML well-formedness would have achieved essentially the same result, but only for XML-based formats. Well-formedness is based on syntax and can be checked by non-validating parsers, i.e. without reference to content models (DTD, XML Schema etc.).
WCAG WG meeting minutes, 17 November 2005: resolutions related to a draft of How to meet SC 4.1.1. Titles (placeholders) for proposed techniques: "Ensuring that unique ids are specified AND that opening and closing tags of all elements can be parsed unambiguously" (for HTML-based content) and "Ensuring that the delivery unit is well-formed AND that unique ids are specified" (for XML-based content). Again, nothing about correct nesting according to a spec's content models.
Understanding Success Criterion 4.1.1: Parsing (for WCAG 2.0) also discusses how well-formedness informed the wording of the success criterion: "Note: The concept of "well formed" is close to what is required here. However, exact parsing requirements vary amongst markup languages, and most non XML-based languages do not explicitly define requirements for well formedness. Therefore, it was necessary to be more explicit in the success criterion in order to be generally applicable to markup languages. Because the term "well formed" is only defined in XML, and (because end tags are sometimes optional) valid HTML does not require well formed code, the term is not used in this success criterion."

In other words, a correct understanding of the SC requires understanding the distinction between XML's concepts of well-formedness and validity. The parsing SC is based on the concept of well-formedness. Unfortunately, in non-XML-based languages, there are no tools to check syntax independently from validity (i.e. content models). This is why techniques for SC 4.1.1 rely on validation.

Note: This is a shortened version of Notes on the History of Success Criterion 4.1.1, which I wrote up on a personal website.

dd8 commented 2 years ago

One consideration here is the HTML 5 parser adoption agency algorithm: https://html.spec.whatwg.org/multipage/parsing.html#adoption-agency-algorithm

This algorithm runs in two situations:

a) when tags are mis-nested and the document is not well-formed in the XML sense https://html.spec.whatwg.org/multipage/parsing.html#misnested-tags:-b-i-/b-/i b) when elements are well-formed in the XML sense, but are used where they're not allowed: https://html.spec.whatwg.org/multipage/parsing.html#unexpected-markup-in-tables

For example, the img cannot appear as a direct child of table:

<table>
  <img src="test.png">
  <tr>
    <td>Cell</td>
  </tr>
</table>

So the adoption agency algorithm moves the img outside the table and produces the following DOM:

<img src="test.png">
<table>
  <tr>
    <td>Cell</td>
  </tr>
</table>

The parsing algorithm also discards some elements that are well-formed in the XML sense, but used with a forbidden ancestor: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody

For example, this markup is well-formed in the XML sense:

<form>
  <input name="one"/>
  <form>
    <input name="two"/>
  </form>
</form>

but produces this DOM if parsed as HTML (1) because form cannot be nested inside form:

<form>
   <input name="one">  
   <input name="two">
</form>

(1) Documents are parsed as HTML if they're served with MIME type text/html. Documents are parsed as XML when served with MIME type application/xml+xhtml. The HTML is not transformed this way if the document is parsed as XML. In this case the document is loaded directly into the DOM by an XML parser and none of the HTML parsing algorithm is used. This is very much an edge case since fewer than 0.05% of pages are served as application/xml+xhtml https://commoncrawl.github.io/cc-crawl-statistics/plots/mimetypes

mraccess77 commented 1 year ago

Do I understand correctly that syntax issues we are discussing that would technically fail WCAG 2.0/2.1 4.1.1 would be the misnested ones with examples with nesting such that tags are closed and opened in the wrong order such as the example linked above <p>1<b>2<i>3</b>4</i>5</p> ?
I understand that we are placing a note in WCAG 2.0 and 2.1 understanding documents saying the SC is automatically met - but that is not a normative note.

If one was to use the nu validator from W3C - would they be looking for errors listed as "violates nesting rules."? I want to make sure that there is clear guidance on which nesting items can be ignored by the validator as related to the content model and which ones are syntactical in a way that anyone can differentiate.

cstrobbe commented 1 year ago

The HTML Validator reports many syntax issues that don't violate SC 4.1.1 (i.e. in the originally intended meaning), and since there are various types of syntax issues, these are described differently by the validator. I filter out the irrelevant ones using a bookmarklet based on Steve Faulkner's WCAG Parsing Bookmarklet.

Without a bookmarklet, you really need to understand both SC 4.1.1 and the validator's errors and warnings very well in order to know what violates the SC and what doesn't. The following are examples of failures:

End tag div seen, but there were open elements.
Stray end tag section.
End tag em violates nesting rules.
Duplicate attribute class.
Duplicate attribute id.
Duplicate ID search-1.

dd8 commented 1 year ago

If it's helpful I can go through all the error states in the VNU parser used by the HTML Validator and produce a list of these, with corresponding validator messages. This list won't include content model errors, because those aren't produced by the parser. I had a quick look at the code and can see around 108 parser error states.

Once that's done someone can go through them and decide which ones map to 4.1.1

PS I'm quite familiar with the internals of the VNU parser (because I did a port of it from Java to C++) and have over 25 years professional experience of writing HTML parsers.

mraccess77 commented 1 year ago

Hi @dd8 I wouldn't want to ask you to do that given browser support and the direction of the note - but what you and others have already provided is helpful to understand the scope. of what may be out there for the limited situations where it is important.

dd8 commented 1 year ago

Fair enough - if anyone needs to know which messages are reported by the parser you can find them in:

https://github.com/validator/htmlparser/blob/master/src/nu/validator/htmlparser/impl/ErrorReportingTokenizer.java https://github.com/validator/htmlparser/blob/master/src/nu/validator/htmlparser/impl/TreeBuilder.java

The error reporting functions all have names with an err prefix like errNoSpaceBetweenAttributes, errDuplicateAttribute or errSlashNotFollowedByGt and contain the message strings reported by the validator.

Edit: there are a small number of content model errors reported by the parser (e.g. malformed tables and nested headings) because these are fixed by the parser. See https://github.com/w3c/wcag/issues/978#issuecomment-1207165424 for details

w3c / wcag

What does nested according to the specification mean in SC 4.1.1 #978