Open mraccess77 opened 4 years ago
I have read the WCAG documents on this subject and could not find a clear answer to this question. I therefore propose that
I suspect that the WCAG parsing only bookmarklet is often used for testing. This interprets SC 4.1.1 in a way that it is not only about correct nesting, but that according to the HTML specification also the child elements must be correct.
Unfortunately I can't find any information about whether the DOM or the source code should be checked:
The question is also to what extent AT uses DOM or source code or whether they use the Accessibility APIs of the operating systems.
I've always interpreted that as meeting the spec of the language used. I.e. if you're writing HTML5, according to that spec. (I'd like to avoid tangets about multiple HMTL specs please!)
Therefore the nesting should be according to the rules of that spec, so I agree with the parsing bookmarket's approach.
DOM vs source is tricky, as you have to use the DOM to work out what the source code (including scripts) actually is, and then work out if that matches the spec. Another good reason to depricate 4.1.1, as if you are going by DOM then you should look at impact not spec.
My understanding is that AT genearlly uses the accessibility API of the system, but there are some odd cases which can use direct access (e.g. Dragon possibily?).
If there are still ATs using direct access to the source, then they're very unlikely to apply JavaScript or CSS changes to the source. This may impact assumptions about JavaScript/CSS elsewhere in WCAG.
The reason I think direct access ATs are unlikely to apply JavaScript/CSS is that it's a lot of effort - at least an order of magnitude harder than pulling the information from the browser DOM or accessibility API. To apply JavaScript you need:
<script>
elements,script src
files Even if you pull in an existing implementation like Chrome's V8 JavaScript engine you still have a lot of integration work to do on the first 3 items above.
To apply CSS you need:
<style>
and <link>
elements,link rel=stylesheet
files To apply JS/CSS you basically have to build most of a browser except the rendering portion.
Even after using the "Parsing only" or TPG's Validate Page bookmarklet, on the W3C nu validator results, there seem to be a number of things flagged as errors that probably have little or now impact on accessiblity, such as
name
attribute on div
or title
attribute on svg
div
used within strong
or other inline elementsThen there are other cases where I am less confident that they are harmless, such as a div
as child of ul
.
Apart from custom attributes that have been discussed here #1078 and seem to be OK, I'd be curious what folks see as exemptions that do not violate the letter of 4.1.1?
This issue was triggered by a mailing list contribution by me but I hadn't seen it until it was referenced in a BIK-BITV issue (in German) today. My insistence on nesting based on syntax instead of nesting based on content models is based on the concept of well-formedness that informed discussions about the formulation of the success criterion in the years 2005-2008. Below are a few pointers to those discussions.
In other words, a correct understanding of the SC requires understanding the distinction between XML's concepts of well-formedness and validity. The parsing SC is based on the concept of well-formedness. Unfortunately, in non-XML-based languages, there are no tools to check syntax independently from validity (i.e. content models). This is why techniques for SC 4.1.1 rely on validation.
Note: This is a shortened version of Notes on the History of Success Criterion 4.1.1, which I wrote up on a personal website.
One consideration here is the HTML 5 parser adoption agency algorithm: https://html.spec.whatwg.org/multipage/parsing.html#adoption-agency-algorithm
This algorithm runs in two situations:
a) when tags are mis-nested and the document is not well-formed in the XML sense https://html.spec.whatwg.org/multipage/parsing.html#misnested-tags:-b-i-/b-/i b) when elements are well-formed in the XML sense, but are used where they're not allowed: https://html.spec.whatwg.org/multipage/parsing.html#unexpected-markup-in-tables
For example, the img cannot appear as a direct child of table:
<table>
<img src="test.png">
<tr>
<td>Cell</td>
</tr>
</table>
So the adoption agency algorithm moves the img outside the table and produces the following DOM:
<img src="test.png">
<table>
<tr>
<td>Cell</td>
</tr>
</table>
The parsing algorithm also discards some elements that are well-formed in the XML sense, but used with a forbidden ancestor: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody
For example, this markup is well-formed in the XML sense:
<form>
<input name="one"/>
<form>
<input name="two"/>
</form>
</form>
but produces this DOM if parsed as HTML (1) because form cannot be nested inside form:
<form>
<input name="one">
<input name="two">
</form>
(1) Documents are parsed as HTML if they're served with MIME type text/html
. Documents are parsed as XML when served with MIME type application/xml+xhtml
. The HTML is not transformed this way if the document is parsed as XML. In this case the document is loaded directly into the DOM by an XML parser and none of the HTML parsing algorithm is used. This is very much an edge case since fewer than 0.05% of pages are served as application/xml+xhtml
https://commoncrawl.github.io/cc-crawl-statistics/plots/mimetypes
Do I understand correctly that syntax issues we are discussing that would technically fail WCAG 2.0/2.1 4.1.1 would be the misnested ones with examples with nesting such that tags are closed and opened in the wrong order such as the example linked above <p>1<b>2<i>3</b>4</i>5</p> ?
I understand that we are placing a note in WCAG 2.0 and 2.1 understanding documents saying the SC is automatically met - but that is not a normative note.
If one was to use the nu validator from W3C - would they be looking for errors listed as "violates nesting rules."? I want to make sure that there is clear guidance on which nesting items can be ignored by the validator as related to the content model and which ones are syntactical in a way that anyone can differentiate.
The HTML Validator reports many syntax issues that don't violate SC 4.1.1 (i.e. in the originally intended meaning), and since there are various types of syntax issues, these are described differently by the validator. I filter out the irrelevant ones using a bookmarklet based on Steve Faulkner's WCAG Parsing Bookmarklet.
Without a bookmarklet, you really need to understand both SC 4.1.1 and the validator's errors and warnings very well in order to know what violates the SC and what doesn't. The following are examples of failures:
If it's helpful I can go through all the error states in the VNU parser used by the HTML Validator and produce a list of these, with corresponding validator messages. This list won't include content model errors, because those aren't produced by the parser. I had a quick look at the code and can see around 108 parser error states.
Once that's done someone can go through them and decide which ones map to 4.1.1
PS I'm quite familiar with the internals of the VNU parser (because I did a port of it from Java to C++) and have over 25 years professional experience of writing HTML parsers.
Hi @dd8 I wouldn't want to ask you to do that given browser support and the direction of the note - but what you and others have already provided is helpful to understand the scope. of what may be out there for the limited situations where it is important.
Fair enough - if anyone needs to know which messages are reported by the parser you can find them in:
https://github.com/validator/htmlparser/blob/master/src/nu/validator/htmlparser/impl/ErrorReportingTokenizer.java https://github.com/validator/htmlparser/blob/master/src/nu/validator/htmlparser/impl/TreeBuilder.java
The error reporting functions all have names with an err
prefix like errNoSpaceBetweenAttributes
, errDuplicateAttribute
or errSlashNotFollowedByGt
and contain the message strings reported by the validator.
Edit: there are a small number of content model errors reported by the parser (e.g. malformed tables and nested headings) because these are fixed by the parser. See https://github.com/w3c/wcag/issues/978#issuecomment-1207165424 for details
Refer to he following thread. Does SC "nested according to the specification" mean nested according to the syntax of opening and closing tags or in terms of the specification saying certain tags can't be within certain tags.
https://lists.w3.org/Archives/Public/w3c-wai-ig/2019OctDec/0113.html