Proposal to Rephrase Success Criterion 4.1.1

cstrobbe commented 2 years ago

Since many accessibility testers and other accessibility experts erroneously interpret the phrase "elements are nested according to their specifications" as referring to content models instead of syntactical nesting, a rewording that avoids this misunderstanding is highly desirable. Below is a proposed rewording.

In content implemented using markup languages, the following are true, except where the specifications for the markup languages being used allow exceptions to these requirements:

elements have complete start and end tags,

elements are nested according to the syntactical rules of their specifications,

elements do not contain duplicate attributes, and

any IDs are unique.

Note: Start and end tags that are missing a critical character in their formation, such as a closing angle bracket or a mismatched attribute value quotation mark are not complete.

Note: Syntactically correct nesting is distinct from nesting according to the content models specified in a technical specification. The second condition of the success criterion does not require correct content models; only correct syntax.

Note: When a scripting language is used to manipulate elements or attributes (or both) in the Document Object Model, the resulting in-memory representation is still regarded as "content implemented using markup languages".

Description of the changes:

The second condition has been reworded to highlight syntactical correctness (as opposed to validity of content models).
The second note, which is new, draws attention to this. The first note is identical to the note in the WCAG 2.1 recommendation from June 2018.
The numbering is new; the phrase "the following are true" is copied from other success criteria, such as SC 1.2.1 and SC 2.2.2.
The third note addresses an issue unrelated to the distinction between correct syntax and validity.

Compatibility with existing versions of WCAG 2 and EN 301 549: Since the proposed rewording results in a requirement that is less strict than the interpretation of most accessibility testers, all documents that pass the current version of SC 4.1.1 should also pass its proposed rewording. In this sense, the proposed rewording is compatible with the current version.

Clause 9.4.1.1 of EN 301 549 says, "Where ICT is a web page, it shall satisfy WCAG 2.1 Success Criterion 4.1.1 Parsing". Unless the editors of EN 301 549 want to retain the current version of the success criterion, no rewording of clause 9.4.1.1 is needed beyond, at some future point in time, an update of the referenced version of WCAG.

Not addressed by this proposal: The proposal does not address whether unbalanced attribute quoting counts as a failure of SC 4.1.1. (See the discussion on the failure examples in F70.) The first note mentions "a mismatched attribute value quotation mark" as an example an incomplete start tag, but notes are non-normative and the SC does not say anything about attribute syntax. Adding a fifth condition, such as "attribute syntax is used according to specification", might therefore be interpreted as making the SC stricter than in WCAG 2.1.

The intent of this rephrasing is not to “defend” the many types of validation errors that accessibility testers flag using this success criterion. My intent is merely to eliminate a common misunderstanding about what the success criterion actually means. (See my comment on issue #978 for notes about how XML's concept of well-formedness informed the wording of the success criterion.) If non-syntactical validation errors which impact accessibility are found, these should be caught either by existing success criteria or by new ones that still need to be created.

Clarification in response to Alastair Campbell's comment on content models (18.07.2022). "Content models" refers to the descriptions of what each element may contain. For example, in HTML 5 a div may not be nested inside a span. In SGML and in the early days of XML, content models were described in DTDs. For example, <!ELEMENT chapter (chaptertitle, (para | heading)+)> This line declares the element chapter and says it must contain a chaptertitle followed by at least one para or heading. In HTML 5, content models are not expressed in DTDs, XML Schemas or similar formal languages, but described in text. See, e.g. "content model" under The section element, which says that this element may only contain flow content.

bruce-usab commented 2 years ago

@cstrobbe I am very grateful for your close analysis and explanation. That said, AGWG has had great difficulty advancing two considerably more trivial (and essentially editorial) changes to normative phrasing in 2.0.

My own preference would be to incorporate all of this into Understanding and other related supporting materials, as you are doing, for example with https://github.com/w3c/wcag/issues/2187.

GreggVan commented 2 years ago

I think this all goes back to the intent and understand of the people who first created the SC. That would determine if it was errata or a change in their understanding of what the SC was intended to mean.

Unfortunately — a) we don’t have the ability to talk to all those on the working group and. b) all public reviewers - both those that commented and those that were satisfied or relied on the wording and therefore did not comment are involved in the process as well.

Thus we cannot judge intent or understanding of these — so cannot judge if it is errata. We would need treat is as a change and a loosening of the SC. This things that complied with 2.2 would fail earlier versions - which is something the group so far has been reluctant to do. (Errata change previous versions so the problem does not arise for errata.)

Even changing the understanding document to say something other than what the SC says would be a problem. The Understanding doc is just to explain the SC - not to say that something should be different.

So the only way this can advance would be for the WG to decide to change current policy.

We should gather these types of things up in one place and decide if we want to / should do that or not.

g

cstrobbe commented 2 years ago

@GreggVan My rewording is based on what I remember from the discussions in the WG at the time, the statement in Success Criterion 4.1.1: Parsing that XML's well-formedness is close to what the SC requires (in other words, requiring correct content models goes beyond well-formedness and beyond the SC) and what I could reconstruct from publicly available older discussions (see my comment on What does nested according to the specification mean in SC 4.1.1). The rationale for my proposal is that it represents what was always intended.

I understand that we can't contact everyone who was involved in those discussions, but a clarification on what "nested according to their specifications" means has been requested for years. If WCAG 2.2 adopted the proposed wording, non-compliance with older versions would be caused by auditors going beyond the intent of the current SC 4.1.1. If this is better handled by means of errata, then I'm perfectly happy with that.

JAWS-test commented 2 years ago

@zcorpan was also of the opinion like @cstrobbe , but then changed his mind

alastc commented 2 years ago

I think the test for an errata would be whether this would clarify the SC without changing the intended meaning:

"In content implemented using markup languages, elements have complete start and end tags, elements are nested according to the syntactical rules of their specifications, elements do not contain duplicate attributes, and any IDs are unique, except where the specifications allow these features."

The understanding document includes phrases like:

the content is created according to the rules defined in the formal grammar for that technology. In markup languages, errors in element and attribute syntax and failure to provide properly nested start/end tags lead to errors that prevent user agents from parsing the content reliably. Therefore, the Success Criterion requires that the content can be parsed using only the rules of the formal grammar.

The concept of "well formed" is close to what is required here.

I'm not entirely sure what @cstrobbe meant by "content models", but the above seems like a reasonable suggestion for an errata.

GreggVan commented 2 years ago

hmmmm I agree with

... the test for an errata would be whether this would clarify the SC without changing the intended meaning:

The concern here is that the edit narrows and rigidifies beyond the original SC.

For example

If the spec says "there is no requirement to nest them strictly, but if you do then use this syntax"

Your edit would have the effect of deleting the first half of the sentence - and only use PART of the spec that is syntax, and ignoring the part of the spec that says there is no requirement to.

So I think the edit can change it. What do you think

gregg

——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

On Jul 18, 2022, at 3:11 AM, Alastair Campbell @.***> wrote:

I think the test for an errata would be whether this would clarify the SC without changing the intended meaning:

"In content implemented using markup languages, elements have complete start and end tags, elements are nested according to the syntactical rules of their specifications, elements do not contain duplicate attributes, and any IDs are unique, except where the specifications allow these features."

The understanding document includes phrases like:

the content is created according to the rules defined in the formal grammar for that technology. In markup languages, errors in element and attribute syntax and failure to provide properly nested start/end tags lead to errors that prevent user agents from parsing the content reliably. Therefore, the Success Criterion requires that the content can be parsed using only the rules of the formal grammar.

The concept of "well formed" is close to what is required here.

I'm not entirely sure what @cstrobbe https://github.com/cstrobbe meant by "content models", but the above seems like a reasonable suggestion for an errata.

— Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/2525#issuecomment-1187013790, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNGDXUEGL2KLJRLGV4K2VLVUUUT3ANCNFSM5ZRXNZ6A. You are receiving this because you were mentioned.

cstrobbe commented 2 years ago

Gregg Vanderheiden wrote,

If the spec says "there is no requirement to nest them strictly, but if you do then use this syntax"
Your edit would have the effect of deleting the first half of the sentence - and only use PART of the spec that is syntax, and ignoring the part of the spec that says there is no requirement to.

I think the wording of my suggestion avoided that issue:

except where the specifications (...) allow exceptions to these requirements

GreggVan commented 2 years ago

It does indeed - but I think it still leaves the ambiguity since people will have trouble determining what that means. Kind of like a double negative — but not exactly.

Hmmm how to fix

How about a slight tweak to your language

My question is - DOES HTML5 strictly require "nested according to it's syntactical rules " I don’t know the answer.

BY THE WAY - the INTENT was 1 3 and 4 but NOT requiring strict nesting because HTML then did not require it. So you could have H1 H3 H1 H2 nesting and be ok There was a LOT of discussion and people who argued no H3 without H2, but in the end — what was agreed on did not require this.

Best

Gregg

patrickhlauke commented 2 years ago

BY THE WAY - the INTENT was 1 3 and 4 but NOT requiring strict nesting because HTML then did not require it. So you could have H1 H3 H1 H2 nesting and be ok There was a LOT of discussion and people who argued no H3 without H2, but in the end — what was agreed on did not require this.

but that's not nesting, that's heading levels ... or am i missing something here?

stevefaulkner commented 2 years ago

Note: Syntactically correct nesting is distinct from nesting according to the content models specified in a technical specification. The second condition of the success criterion does not require correct content models; only correct syntax.

So the following incorrect content model nesting is OK as far as the Criterion is concerned? <button><a href="#"></a></button>

or

<ul>
<div>
<li>
<li>
</div>
</ul>

alastc commented 2 years ago

@GreggVan - My understanding that 4.1.1 is requiring correct nesting of tags, nothing to do with the order of headings (which are under 1.3.1).

Like Steve's button / list example above (which fails in the validator as "Element div not allowed as child of element ul in this context").

The proposed errata (adding "syntactical") could help to disambiguate the perception of it being more than what was intended.

I'm fairly sure the spec on headings doesn't require any particular ordering of heading tags.

However, it would catch things like <button><h2>Thing</h2></button>.

cstrobbe commented 2 years ago

Regarding the examples in Steve Faulkner's comment: these don't contain syntactic problems, so they meet the SC. (Obviously, their content models are wrong, which is why the validator will report errors for those two examples.)

As Alastair Campbell has pointed out, the hierarchy of headings is irrelevant to this SC. That is not a syntactical issue.

The validator catches <button><h2>Thing</h2></button> because the content model is wrong. However, from a purely syntactical point of view, the nesting is fine (so it wouldn't violate SC 4.1.1). Browsers can build up an unambiguous parse tree based on that code. However, there may be an issue with the role exposed to the accessibility API. If that is the case, the code snippet violates a different SC, namely SC 4.1.2.

patrickhlauke commented 2 years ago

However, it would catch things like <button><h2>Thing</h2></button>

but that's still going beyond syntax (well-formedness) ... so which is it?

GreggVan commented 2 years ago

Ah
I see what you mean. OK Would be good to have examples in Understanding doc so people can distinguish between syntactic and content model

gregg

——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

On Jul 19, 2022, at 7:56 AM, cstrobbe @.***> wrote:

Regarding the examples in Steve Faulkner's comment https://github.com/w3c/wcag/issues/2525#issuecomment-1188878117: these don't contain syntactic problems, so they meet the SC. (Obviously, their content models are wrong, which is why the validator will report errors for those two examples.)

As Alastair Campbell has pointed out, the hierarchy of headings is irrelevant to this SC. That is not a syntactical issue.

The validator catches because the content model is wrong. However, from a purely syntactical point of view, the nesting is fine. Browsers can build up an unambiguous parse tree based on that code. However, there may be an issue with the role exposed to the accessibility API. If that is the case, the code snippet violates a different SC, namely SC 4.1.2.

— Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/2525#issuecomment-1189159843, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNGDXXM4VPRREJZX2VSLATVU263BANCNFSM5ZRXNZ6A. You are receiving this because you were mentioned.

alastc commented 2 years ago

The validator catches <button><h2>Thing</h2></button> because the content model is wrong. However, from a purely syntactical point of view, the nesting is fine (so it wouldn't violate SC 4.1.1).

I guess there is a difference between syntactical and content model that I'm missing.

I thought that those types of errors were exactly what 4.1.1 were supposed to catch, i.e. nested according to the spec.

cstrobbe commented 2 years ago

I guess there is a difference between syntactical and content model that I'm missing.

Please see my update from 18.07.2022 at the top of the page.

I thought that those types of errors were exactly what 4.1.1 were supposed to catch, i.e. nested according to the spec.

That's the misunderstanding this issue attempts to address. This is about syntactical nesting so web content can be "accurately parsed into a data structure" (quoted from Understanding SC 4.1.1).

The understanding document also says,

The concept of "well formed" is close to what is required here. However, exact parsing requirements vary amongst markup languages, and most non XML-based languages do not explicitly define requirements for well formedness. Therefore, it was necessary to be more explicit in the success criterion in order to be generally applicable to markup languages.

Validating content models goes beyond what is intended here.

patrickhlauke commented 2 years ago

I guess there is a difference between syntactical and content model that I'm missing.

it's the difference between syntax and grammar - think of it in terms of a word document...spell checking checks the syntax (words are spelled right), but not grammar (that words are arranged in such a way that they make proper sentences)

cstrobbe commented 2 years ago

If you want to make comparisons with natural language, Chomsky's famous Colorless green ideas sleep furiously is a better analogy: it is syntactically correct and can be parsed into a tree structure. But it makes no sense semantically.

giacomo-petri commented 2 years ago

I had a parallel discussion (I was not aware about this proposal yet) in ACT-Rules about something similar (https://github.com/act-rules/act-rules.github.io/issues/1893).

<label for="first-name">
    <span>First name</span>
    <input type="text" name="fn" value="">
</label>

and

<input type="file" id="test" />
<label for="test">Flash the screen 
    <select size="1">
        <option selected="selected">1</option>
        <option>2</option>
        <option>3</option>
    </select>
    times.
</label>

Initially, I was unconsciously supporting this new issue proposal, assuming that "elements are nested according to their specifications" and "H74: Ensuring that opening and closing tags are used according to specification (HTML)" referred to the syntactical rules of their specifications.

But @Jym77 pointed out that

H74 cover the bit about correct nesting (Step 3  in the test procedure). So, I would say that a label with a for pointing elsewhere than the nested labellable does not pass H74 (and certainly does not pass the "elements are nested according to their specifications" bit of 4.1.1).

where, essentially, code examples above are failing 4.1.1 in the first place. In fact, the examples provided are something not allowed by the label content model specs.

But, per this new proposal,

elements are nested according to the syntactical rules of their specifications

...

Note: Syntactically correct nesting is distinct from nesting according to the content models specified in a technical specification. The second condition of the success criterion does not require correct content models; only correct syntax.

if the content model is no longer relevant in terms of 4.1.1 Parsing, they are no longer failing the 4.1.1 SC.

In addition, per Input Accessible Name and Description Computation rules is still not clear how to calculate the label, as point n.2 states

Otherwise use the associated label element(s) accessible name(s) - if more than one label is associated; concatenate by DOM order, delimited by spaces.

which is quite ambiguous as in the first code example, the for attribute does not exist, in the second code example instead we have a combination of both for/id attributes and nested content, which is quite unpredictable.

Last, but not least, browsers behave inconsistently (more details in https://github.com/act-rules/act-rules.github.io/issues/1893); for example Safari provides a label, while Chrome doesn't.

Do we expect this scenario is failing 1.1.1, 1.3.1, 2.5.3, 3.3.2, 4.1.2 success criteria due to the discrepancy with content model but passing 4.1.1 thanks to the correct syntax?

patrickhlauke commented 2 years ago

it would likely fail 4.1.2 if the end result is a lack of accessible name, and probably 1.3.1 for lack of explicit association/relationship

cstrobbe commented 2 years ago

But @Jym77 pointed out that

H74 cover the bit about correct nesting (Step 3  in the test procedure). So, I would say that a label with a for pointing elsewhere than the nested labellable does not pass H74 (and certainly does not pass the "elements are nested according to their specifications" bit of 4.1.1).

That does not mean that the code fails SC 4.1.1; it means that the code is not using technique H74. Not using technique H74 does not automatically mean you fail the SC the technique addresses.

Neither of those code examples contains syntactical issues in the context of HTML syntax. In the context of XML syntax, the first example would not be well-formed. But in HTML syntax, the input element has no end tag (it's a void element). Whether the label element can contain an input element is not a syntactical question but a matter of content models, and the content model allows it (i.e. as a way of labelling that input).

The second code example does not exhibit any syntax issues but the label element seems to label two controls, i.e. both the one above it and the one inside it. The HTML specification does not seem to define which type of labelling takes precedence, the one defined by the for attribute or the one based on nesting. Hence, the relationship between the label and the control it labels visually (i.e. the control below it) cannot be determined programmatically in an unambiguous manner, so the code seems to violate SC 1.3.1.

giacomo-petri commented 2 years ago

But @Jym77 pointed out that

H74 cover the bit about correct nesting (Step 3  in the test procedure). So, I would say that a label with a for pointing elsewhere than the nested labellable does not pass H74 (and certainly does not pass the "elements are nested according to their specifications" bit of 4.1.1).

That does not mean that the code fails SC 4.1.1; it means that the code is not using technique H74. Not using technique H74 does not automatically mean you fail the SC the technique addresses.

It was not exactly the case; I was supporting the thesis that the label example was not failing 4.1.1 because of point 4 of 4.1.1 sufficient techniques 4 (that includes 3 sufficient techniques), as in my opinion all of them were passed. @Jym77 just pointed out that the first sufficient technique of these group of 3 one is not passing; he is not saying that for this reason it's a 4.1.1 failure.

JAWS-test commented 2 years ago

Hi @alastc,

I would like to ask you to bring the issue to a decision in a timely manner. Background: In the European Union thousands of web sites are checked according to EN 301 549. Most of the violations are found for SC 4.1.1. As an example I would like to mention the German monitoring report: https://www.bfit-bund.de/DE/Downloads/eu-bericht-pdf.pdf;jsessionid=7266E7F6DCC8058D664888E08830EC21?__blob=publicationFile&v=2, page 102. So it seems that the most important accessibility problem is 4.1.1, because it is violated the most. The problems found with 4.1.1 are largely due to incorrect nesting (which is what this issue is about). Only rarely does a duplicate ID show up as a problem. The other errors described in 4.1.1 do not occur in practice because they are automatically corrected by the browser. If 4.1.1 were reworded as suggested by @cstrobbe, 4.1.1 would finally regain the weight it deserves: namely, a low weight. And we could take care of the really important problems of accessibility!

GreggVan commented 2 years ago

+1

I sent some suggestions to @cstrobbe https://github.com/cstrobbe for wording to make it clearer. So I concur with importance of making this clear and avoiding semantic model nesting issues from syntactic - which is what this is about. It is about breaking AT by giving it content it can’t PARSE.

NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do.

gregg

——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

On Jul 23, 2022, at 6:23 AM, JAWS-test @.***> wrote:

Hi @alastc https://github.com/alastc,

I would like to ask you to bring the issue to a decision in a timely manner. Background: In the European Union thousands of web sites are checked according to EN 301 549. Most of the violations are found for SC 4.1.1. As an example I would like to mention the German monitoring report: https://www.bfit-bund.de/DE/Downloads/eu-bericht-pdf.pdf;jsessionid=7266E7F6DCC8058D664888E08830EC21?__blob=publicationFile&v=2 https://www.bfit-bund.de/DE/Downloads/eu-bericht-pdf.pdf;jsessionid=7266E7F6DCC8058D664888E08830EC21?__blob=publicationFile&v=2, page 102. So it seems that the most important accessibility problem is 4.1.1, because it is violated the most. The problems found with 4.1.1 are largely due to incorrect nesting (which is what this issue is about). Only rarely does a duplicate ID show up as a problem. The other errors described in 4.1.1 do not occur in practice because they are automatically corrected by the browser. If 4.1.1 were reworded as suggested by @cstrobbe https://github.com/cstrobbe, 4.1.1 would finally regain the weight it deserves: namely, a low weight. And we could take care of the really important problems of accessibility!

— Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/2525#issuecomment-1193125248, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNGDXRQGXJ6QQ3FFQCVXPDVVPW53ANCNFSM5ZRXNZ6A. You are receiving this because you were mentioned.

JAWS-test commented 2 years ago

@GreggVan

NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do.

In the past there was AT, which accessed the source code and not the DOM. That's why correct source code was important. As far as I know, there is no AT today that accesses the source code. If there were, it would be outdated and quite useless, since web content today is not primarily source code, but source code + CSS + Javascript. The browsers create the (corrected) DOM from this and pass this on to the Accessibility API. The AT uses either the API or the DOM. AT, which would use the source code, would not be able to recognize correct content on many pages, because the content is generated or changed dynamically and thus does not appear in the source code at all. That's why I think that for 4.1.1 we should only care about what is generated as DOM by the browsers.

patrickhlauke commented 2 years ago

NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do.

note that the error correction mechanisms are now a documented part of the HTML specification (while in the past, this was all undocumented and left up to mysterious black box browser heuristics, which is in part the reason for 4.1.1 because it was trying to avoid that devs just relied on testing in their favourite browser and missed how other browsers would parse broken content)

GreggVan commented 2 years ago

+1

gregg

——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

On Jul 23, 2022, at 11:49 AM, Patrick H. Lauke @.***> wrote:

NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do.

note that the error correction mechanisms are now a documented part of the HTML specification (while in the past, this was all undocumented and left up to mysterious black box browser heuristics, which is in part the reason for 4.1.1 because it was trying to avoid that devs just relied on testing in their favourite browser and missed how other browsers would parse broken content)

— Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/2525#issuecomment-1193171685, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNGDXXUC7Z7EG664HRJ35DVVQ5DJANCNFSM5ZRXNZ6A. You are receiving this because you were mentioned.

cstrobbe commented 2 years ago

Double negative?

What follows is another attempt to address Gregg Vanderheiden's comment about something that may look like a double negative.

All SGML-like languages (HTML 4.x, XHTML, HTML 5, SVG, MathML, etc.) rely on hierarchy and a type of syntax tree (of which the DOM is the best-known example). Parsing a document into a syntax tree requires correct nesting at the syntactical level. Each element is always a child of another element, unless it is the document element or root (html in HTML formats). As a consequence, code such as <para>...<bold> ... </para> ... </bold> is prohibited in all SGML-like languages and leads to a parse error, regardless whether the language defines (content models for) para and bold or not. So when we say, "elements are nested according to the syntactical rules of their specifications", we are actually paraphrasing something that SGML, XML and HTML 5 have in common. [1] The exception "except where the specifications allow these features" would not be relevant to these languages.

The exception may be relevant to other types of markup languages, such as TeX and LaTex, but these (1) are geared towards typesetting, (2) would present serious challenges to meet many of the other WCAG success criteria when read directly by user agents and (3) are usually rendered into another format, most frequently PDF. PDF is not a markup language, so SC 4.1.1 does not apply to it. (In practice, we would probably lose nothing by deleting "except where the specifications allow these features", but I don't want to increase resistance to my proposal by adding that change.)

One issue with the current wording is that I haven't commented on yet is how "except where the specifications allow these features" seems to work. "These features" seems to refer to

"elements have complete start and end tags",
"elements are nested according to [the syntactical rules of] their specifications",
"elements do not contain duplicate attributes" and
"any IDs are unique".

SGML-like languages do not merely "allow" these features, they are rather basic requirements. The intended meaning is "except where the specifications allow exceptions to these requirements", but (1) the current wording says the opposite and (2) I don't know of any markup languages on the web that allow the intended exceptions.

One inconvenience with "elements are nested according to the syntactical rules of their specifications" is that HTML, unlike XML, does not cleanly separate syntax and validity, so a parser will always refer to content models when parsing an HTML document into a tree. For example, <p><h1>Content Model Validity</h1></p> can be perfectly parsed into a tree if syntax is the only thing you look at. But if you look at the content model for the p element in HTML 5, you'll notice that p cannot contain heading elements and that it is an element whose end tag can be omitted when it is followed by a heading element. So the browser turns the code into the following: <p></p><h1>Content Model Validity</h1></p>: it closes the first p element, and the end tag </p> is orphaned, which causes a parsing error. If we want to get around this interference between validity and syntax, it may be better to write something like "elements don't overlap at the syntactical level" or "elements don't overlap in the syntax tree". This would fail something like <strong><em></strong></em> but not <p></p><h1>Content Model Validity</h1></p>. If we also want to fail the latter code sample, we would need to add a condition such as "the syntax tree does not contain orphaned end tags". (I am avoiding the term "Document Object Model", because the DOM is not a data structure or a set of data structures.)

Examples for the Understanding doc

I think it would be beneficial to add some examples to Understanding Success Criterion 4.1.1: Parsing, which currently doesn't have an examples section.

Example 1

<span><div>...</div></span> is not valid HTML because the content model for the span element does not allow div elements. However, the code is syntactically unambiguous and can be parsed into a data structure. Therefore, it does not fail SC 4.1.1.

Example 2

<a href="https://www.w3.org/"><a href="https://www.w3.org/WAI/">Web Accessibility Initiative</a></a> is not valid because the content model for the a element does not allow any other interactive content. However, the code is syntactically unambiguous and can be parsed into a data structure. Therefore, it does not fail SC 4.1.1.

Example 3

<p>The raven himself is hoarse  
<p>That croaks the fatal entrance of Duncan  
<p>Under my battlements. Come, you spirits`

This code snippet meets the success criterion when used in an HTML document (but not in XHTML): it is both syntactically correct and valid. The p element is an element where the end tag is optional. A browser is able to parse it and implicitly add the end tags as follows:

<p>The raven himself is hoarse</p>
<p>That croaks the fatal entrance of Duncan</p>
<p>Under my battlements. Come, you spirits</p>

Example 4

<ul>
  <div>
    <li>List item 1
    <li>List item 2
  </div>
</ul>

This code snippet is not valid because the content model of ul does not allow a div element as a child element. However, the code is syntactically unambiguous and can be parsed into a data structure. Therefore, it does not fail SC 4.1.1.

Example 5

<p><input id="username" name="username" type="text" /></p>
<p><label for="username">College:
  <select size="1">
    <option selected="selected">Foxe College</option>
    <option>Jordan College</option>
    <option>Wordsworth College</option>
  </select>
</label>

This code snippet is invalid because a select element that is a descendant of a label element with a for attribute must have an ID that matches the value of the for attribute. Since the code can be parsed unambiguously into a data structure, it does not fail SC 4.1.1. However, the HTML specification does not define which type of labelling takes precedence: the one defined by the for attribute or the one based on nesting. Hence, the relationship between the label and the control it labels visually (i.e. the control below it) cannot be determined programmatically in an unambiguous manner. If browsers determine that the first input's accessible name is the label element (including its descendant, the select element), the code violates SC 1.3.1.

[1] What is perhaps confusing is that non-XML languages such as HTML 4 and HTML 5 allow certain elements to omit the end tag. (See Optional tags in the HTML 5 spec.) For example, <div><p>Elements have complete start and end tags.</div> is perfectly valid in both HTML 4 and HTML 5 but not in XHTML. The p element is entirely contained in the div element because the browser "knows" that when it encounters the end tag </div>, it can close all child elements where end tags may be omitted, so it silently inserts a </p> just before the </div> and it is perfectly possible to create an unambiguous parse tree. For the purpose of SC 4.1.1, <div><p>Elements have complete start and end tags.</div> is perfectly fine (except in XHTML) since the elements are correctly nested. It is important to bear in mind that the SC requires elements to be properly nested without referring to the details of how optional tags work.

(For tree construction and syntactically correct nesting in HTML 5, I refer to the stack of open elemennts, the section Tree construction and the sections on misnested tags: Misnested tags: <b><i></b></i> and Misnested tags: <b><p></b></p>.)

JAWS-test commented 2 years ago

@cstrobbe

PDF is not a markup language, so SC 4.1.1 does not apply to it.

I wonder if you can say that as a general statement. On the one hand, PDF uses Postscript, which is not a markup language. On the other hand, PDF uses tags, which I think is a markup language. And the tags are relevant from an accessibility perspective because AT uses them to read the content of the PDF. The postscript is not accessible to AT. I suspect that the tags in the PDF can and must also be evaluated according to 4.1.1. Incidentally, so does the PAC, which lists many syntax errors in the WCAG section under 4.1.1 (even though most of them are not strictly speaking a violation of 4.1.1).

cstrobbe commented 2 years ago

PDF is not a markup language, so SC 4.1.1 does not apply to it.

I wonder if you can say that as a general statement.

"Markup language" may be a rather vaguely defined term, but in the context of WCAG, it definitely refers to formats that you can edit in a programmer's editor or an IDE, rather than to binary formats. Even though PDF uses markup for certain aspects of the content, that does not make it a markup language. PDF was originally meant to describe pages in such a way that they would be printed exactly the same way regardless of platform or printer (I am primarily referring to the equipment used in printing houses here). PDF builds on top of PostScript, which is not a markup language but a programming language focused on page description.

You can't open PDF in an editor, edit its "code" (which is to a important extent binary anyway) and expect to have a working PDF document as a result. Calling PDF a markup language renders the term "markup language" meaningless for the purposes of WCAG.

GreggVan commented 2 years ago

+1

When 4.1.1 was written it was meant to apply only to Markup Languages like HTML. PDF was (and is) not considered a Markup language - and 4.1.1 was not intended to apply to it.

gregg CoChair of WCAG when 4.1.1 was written

On Jul 23, 2022, at 2:10 PM, cstrobbe @.***> wrote:

PDF is not a markup language, so SC 4.1.1 does not apply to it.

I wonder if you can say that as a general statement.

"Markup language" may be a rather vaguely defined term, but in the content of WCAG, it definitely refers to formats that you can edit in a programmer's editor or an IDE, rather than to binary formats. Even though PDF uses markup for certain aspects of the content, that does not make it a markup language. PDF was originally meant to describe pages in such a way that they would be printed exactly the same way regardless of platform or printer (I am primarily referring to the equipment used in printing houses here). PDF builds on top of PostScript, which is not a markup language but a programming language focused on page description.

You can't open PDF in an editor, edit its "code" (which is to a important extent binary anyway) and expect to have a working PDF document as a result. Calling PDF a markup language renders the term "markup language" meaningless for the purposes of WCAG.

— Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/2525#issuecomment-1193189043, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNGDXRBL6H6LNLSRNT5MGDVVRNVVANCNFSM5ZRXNZ6A. You are receiving this because you were mentioned.

bruce-usab commented 2 years ago

I think it would be beneficial to add some examples to Understanding Success Criterion 4.1.1: Parsing, which currently doesn't have an examples section.

+1 to adding examples. But I have a question about Example 4 (the div inside ul), and maybe I am not comfortable with this one. What does that do the DOM and what is the experience for the screen reader user? Will different browsers construct the DOM differently or are they consistent with regard to this code example?

elements are nested according to [the syntactical rules of] their specifications

I had always assumed the bracketed bit to be implicit, but for years now, the general consensus is to flag 4.1.1 failures only when Accessibility Supported (CR4) is applicable. So if bad nesting results in the screen reader experience to diverge from the visual presentation, that fails 4.1.1 (and probably also 1.3.1 Info and Relationships).

The more problematic aspect of 4.1.1, in my view, is that it can unambiguously fail source code that is sloppy but not problematic in actual practice. (Two example of sloppy-but-not-barriers are duplicate attributes with duplicate values, and non-unique IDs used only on empty elements.)

cstrobbe commented 2 years ago

@bruce-usab

But I have a question about Example 4 (the div inside ul), and maybe I am not comfortable with this one. What does that do the DOM and what is the experience for the screen reader user? Will different browsers construct the DOM differently or are they consistent with regard to this code example?

Different browsers may treat this differently. See this test page for div inside ul. I tested it with Firefox, Microsoft Edge and NVDA. In both Firefox and Microsoft Edge, NVDA correctly announces each of the three lists and the number of items inside them. However, for the list with the div inside it, the individual list items were not announced as list items in Firefox. This seems to mean that the list items are not programmatically determinable. This looks like a failure of SC 1.3.1 so there is no need leverage SC 4.1.1 to fail it. In Edge, the div makes no difference at all; each item in the list is announced as a list item (NVDA says "bullet").

So if bad nesting results in the screen reader experience to diverge from the visual presentation, that fails 4.1.1 (and probably also 1.3.1 Info and Relationships).

As I mentioned above, if it fails SC 1.3.1, why worry about it not getting caught by SC 4.1.1? SC 4.1.2 is another SC that may catch issues that are not caught by SC 4.1.1 if we just check syntax instead of content models.

The more problematic aspect of 4.1.1, in my view, is that it can unambiguously fail source code that is sloppy but not problematic in actual practice.

I know, but the complexity of HTML is such that if we want to catch just those syntax issues that have a negative impact on accessibility, we need either a much more complex version of SC 4.1.1 or a much looser version of SC 4.1.1 combined with a bunch of other success criteria that fill the gaps. Both validation and syntax checking are very crude processes if the goal is to identify only those issues that have an impact on accessibility.

bruce-usab commented 2 years ago

Thank you @cstrobbe for that additional exposition. I am now comfortable with all five of your Examples for the Understanding doc.

giacomo-petri commented 2 years ago

Different browsers may treat this differently. See this test page for div inside ul. I tested it with Firefox, Microsoft Edge and NVDA. In both Firefox and Microsoft Edge, NVDA correctly announces each of the three lists and the number of items inside them. However, for the list with the div inside it, the individual list items were not announced as list items in Firefox. This seems to mean that the list items are not programmatically determinable. This looks like a failure of SC 1.3.1 so there is no need leverage SC 4.1.1 to fail it. In Edge, the div makes no difference at all; each item in the list is announced as a list item (NVDA says "bullet").

I've some concern about this empirical approach, because it may lead to unexpected results. If with all the X technology a user owns a specific scenario works as expected, it doesn't guarantee it will work also with the X+1 technology, especially if the code implemented doesn't meet the HTML5 specs. On the contrary, if a specific technology (or combination of technologies) is not behaving as expected it doesn't necessary mean it's a WCAG failure. For example, the code you've proposed,

<ul id="vegetables">
        <li>tomatoes</li>
        <li>onions</li>
        <li>beans</li>
        <li>peas</li>
        <li>potato chips</li>
</ul>

<ul id="vegetables">
        <li>tomatoes</li>
        <li>onions</li>
        <li>beans</li>
        <li>peas</li>
        <p>potato chips</p>
</ul>

using Huawei + Talkback, list and list items semantics is not announced at all; they are just announced as regular text (with "bullet" if visually present), both if content is properly nested inside the <ul> or not.

I've also tested it with VO on Safari, Chrome and Firefox. All of them behave as expected announcing "list X items" for the list and "item X of Y" for each list item. So, currently, even if in terms of content model and html correctness this code is incorrect, using the combination of technologies I'm owning, the relationships between elements are kept and rendered by the technologies as expected (apparently passing 1.3.1). Excluding Huawei + Talkback that is not working even with real <ul> and <li> items, till now seems that all the technologies I've tested are contemplating the 5th item (<p>) as a list item, in relationship with the parent list element; if this scenario doesn't longer impact 4.1.1 but just 1.3.1 Info and Relationships SC (and from the tests I've performed the element seems programmatically determined as a list item), how can I be sure that the behaviour is the same for every technology the user is using, if the content model used doesn't match html5 specs?

Moreover, if the tester is not technical, might not be able to determine when this is an issue and when instead it is not (considering that validating this code in a html validator might not be longer relevant for 4.1.1); the HTML validator gave good instructions to non-tech people when something is not properly implemented. For example, removing the list-style CSS property (list-style="none"), with Safari + VO elements are no longer announced as list and list items and correct list structure and incorrect list structure are both announced as regular text. In fact developer are forced to use role="list" on the <ul> element to allow technologies announcing elements semantic.

Even if I generally agree with your proposal @cstrobbe of clarifying what's impacted by 4.1.1, distinguishing content models and syntactical nesting, I'm a little worried about the possible interpretations and impact of these changes, especially for scenarios like the previous one.

Last, but not least, quoting the first paragraph of 1.3.1 SC understanding

The intent of this Success Criterion is to ensure that information and relationships that are implied by visual or auditory formatting are preserved when the presentation format changes. For example, the presentation format changes when the content is read by a screen reader or when a user style sheet is substituted for the style sheet provided by the author.

scenario with <p> used as <li> is partially covered, or at least, interpretations might be not unique. In fact, reviewing the example you've provided, AT renders also the <p> element as a <li>. We might argue that visually, the <p> element is not styled as a <li> so this discrepancy might cause loss of information and fails 1.3.1. But, at the same time I might use some CSS background (or CSS pseudo-selectors) to make the <p> element visually looking as a <li>; we may recall the SC understanding "when a user style sheet is substituted for the style sheet provided by the author." so the visual presentation of the <p> element might change, but we could say the same thing for

<div role="list" id="vegetables">
        <div role="list-item">tomatoes</div>
        <div role="list-item">onions</div>
        <div role="list-item">beans</div>
        <div role="list-item">peas</div>
        <div role="list-item">potato chips</div>
</div>

or

<div role="list" id="vegetables">
        <div role="list-item">tomatoes</div>
        <div role="list-item">onions</div>
        <div role="list-item">beans</div>
        <div role="list-item">peas</div>
        <div class="hidden">non-visible list item</div>
        <div  role="list-item">potato chips</div>
</div>

changing the CSS of the first example it might result in a different appearance (no longer a list of items)
assuming that class="hidden" removes the element from both the visual presentation and acc tree (e.g. .hidden {display:none}), changing the CSS of the second example it might result in a different list, including elements that shouldn't be rendered.

So which are the boundaries in interpreting the first paragraph of 1.3.1 SC Understanding related to this specific topic in combination with 4.1.1?

As I mentioned above, if it fails SC 1.3.1, why worry about it not getting caught by SC 4.1.1? SC 4.1.2 is another SC that may catch issues that are not caught by SC 4.1.1 if we just check syntax instead of content models.

This assumption is valid for this specific case of 4.1.1 and 1.3.1, but it's not valid in general. Depending on the levels of the SC involved (A, AA, AAA) it might be not applicable if criteria have different levels and I want to meet a specific level. In addition, while generating a VPAT, I might be aware that for a specific SC I'm not compliant (e.g. SC is partially supported), but it's relevant to determine exactly which "Supports, Partially Supports, Does Not Support, Not Applicable, Not Evaluated".

The more problematic aspect of 4.1.1, in my view, is that it can unambiguously fail source code that is sloppy but not problematic in actual practice.

I know, but the complexity of HTML is such that if we want to catch just those syntax issues that have a negative impact on accessibility, we need either a much more complex version of SC 4.1.1 or a much looser version of SC 4.1.1 combined with a bunch of other success criteria that fill the gaps. Both validation and syntax checking are very crude processes if the goal is to identify only those issues that have an impact on accessibility.

I agree; that said, potentially a list structure improperly managed might be more relevant in terms of accessibility than a duplicated ID or duplicated attribute (which in the vast majority of cases do not impact accessibility at all).

yatil commented 1 year ago

Copying over my comment on #2676 to this thread for completeness sake. I did comment on the first open issue, as this issue was closed. Probably everything in here was already litigated above.

In the Web Accessibility Slack, the claim is that 4.1.1 would narrowly only apply to SGML/XML parsing. For example, a <p> nested inside of an <h1> would pass 4.1.1 despite HTML (living standard) says it is not allowed. Only nesting in the following way would not conform here: <h1><p>…</h1></p>

As this is irrelevant for the created DOM according to HTML (living standard) parsing rules, it feels useless to have this Success Criterion at all. It would be more useful if the SC would be adapted to mean “nest according to the rules of the markup language you actually use” instead of “nest according to the rules of a different markup language”.

Here’s the whole argumentation, including a suggestion for re-wording the SC.

I think it’s worth the discussion if this SC should be adapted to fit modern standards (for example with custom elements, almost any element name would be correct to use and there are little ways to know if a element is a typo or a custom element). Furthermore, the syntax of the unparsed HTML has so little significance anymore now that a lot of DOM manipulation happens in the browser, after parsing.

The interpretation of this SC that the goal is to have a valid content model as the result of the parsing of the syntax plus DOM manipulations is practical, useful, and actually leads to more accessible content – instead of chasing theoretical syntactical errors.

Considering that this is a contentious and mostly useless SC, I would lean on the side of either simplifying it (“Elements are used and nested according to the specification of the markup language used.”) or removing it in WCAG 2.2.

cstrobbe commented 1 year ago

As this is irrelevant for the created DOM according to HTML (living standard) parsing rules, it feels useless to have this Success Criterion at all. It would be more useful if the SC would be adapted to mean “nest according to the rules of the markup language you actually use” instead of “nest according to the rules of a different markup language”.

Since WCAG stands for Web Content Accessibility Guidelines, not HTML 5 Accessibility Guidelines, an argument based solely on HTML 5's parsing rules is insufficient.

yatil commented 1 year ago

As this is irrelevant for the created DOM according to HTML (living standard) parsing rules, it feels useless to have this Success Criterion at all. It would be more useful if the SC would be adapted to mean “nest according to the rules of the markup language you actually use” instead of “nest according to the rules of a different markup language”.

Since WCAG stands for Web Content Accessibility Guidelines, not HTML 5 Accessibility Guidelines, an argument based solely on HTML 5's parsing rules is insufficient.

But the SC is specific to markup languages, the most used markup language for Web Content is HTML (living standard). “Web Content Accessibility Guidelines” that do not take into account the principles of the technology used is kinda not that useful.

What other widely-used frontend markup languages do you refer to where that your kind of rule interpretation makes a meaningful difference for accessibility?

cstrobbe commented 1 year ago

But the SC is specific to markup languages, the most used markup language for Web Content is HTML (living standard). “Web Content Accessibility Guidelines” that do not take into account the principles of the technology used is kinda not that useful. What other widely-used frontend markup languages do you refer to where that your kind of rule interpretation makes a meaningful difference for accessibility?

At the time when WCAG 2.0 was being written, XML-based markup languages looked like the future. XHTML 2.0 was developed until 2010. Then HTML 5 changed everything. The point is that technologies change, and the often change in unpredictable ways. This is one of the reasons why WCAG 2.0 moved to a technology-independent approach: the HTML-based approach from WCAG 1.0 was not sufficiently future proof.

Has WCAG 2.2 abandoned this technology-independent approach?

Please bear in mind that SC 4.1.1 sits under Guideline 4.1 Compatible, which reads,

Maximize compatibility with current and future user agents, including assistive technologies.

Is WCAG 2.2 sufficiently future proof if success criteria for markup languages assume that HTML 5 will never be replaced by something else?

yatil commented 1 year ago

How does ”If you use a markup language, you need to nest your elements according to the rules of that markup language” technology-specific?

If you use XML, XML rules apply. If you use HTML (living standard), its rules apply.

That 4.1.1 applies to markup languages is in the SC, since 2.0, quote:

In content implemented using markup languages

In the inverse, arguing that some XML rules are meant when that is not part of the success criterion makes the interpretation technology-specific. Of course, you can argue that HTML (living standard) is not meant to be a markup language according to WCAG. Which makes the SC non-applicable to HTML (living standard) documents.

Maximize compatibility with current and future user agents, including assistive technologies.

Is WCAG 2.2 sufficiently future proof if success criteria for markup languages assume that HTML 5 will never be replaced by something else?

It does not. First, HTML5 has been replaced with HTML (living standard) in 2018. Second, I have no idea what you even want to say with this. The web is great at keeping content backwards compatible. HTML5 documents are interpreted perfectly by HTML (living standard) browsers. Of course, there is no guarantee for this. Basically, the Guideline wants you to not use proprietary technology that is not widely used on the market and not accessible. If you use a technology that is well established, it meets the guideline, IMO.

GreggVan commented 1 year ago

In the past there was AT, which accessed the source code and not the DOM. That's why correct source code was important. As far as I know, there is no AT today that accesses the source code. If there were, it would be outdated and quite useless, since web content today is not primarily source code, but source code + CSS + Javascript. The browsers create the (corrected) DOM from this and pass this on to the Accessibility API. The AT uses either the API or the DOM. AT, which would use the source code, would not be able to recognize correct content on many pages, because the content is generated or changed dynamically and thus does not appear in the source code at all. That's why I think that for 4.1.1 we should only care about what is generated as DOM by the browsers.

Perfect
Thanks

gregg

robinmetral commented 1 year ago

Side note about this comment:

I'm fairly sure the spec on headings doesn't require any particular ordering of heading tags.

I think it does—this is what is says under Headings and outlines:

Each heading following another heading lead in the outline must have a heading level that is less than, equal to, or 1 greater than lead's heading level.

a.k.a. do not skip heading levels (unless closing a section).

Not sure whether this changes anything about the meaning of 4.1.1.—but it seems like it makes skipping heading levels non-conforming HTML.

JAWS-test commented 1 year ago

@robinmetral

elements are nested according to their specifications

probably does not refer to the level of headings, as prescribed in the HTML specification and correctly noted by you. However, this is not clear from SC 4.1.1 and its understanding, so I would be grateful for a corresponding addition to the understanding.

GreggVan commented 1 year ago

I believe there is a long thread on this some time ago — clarifying that this provision applies to the syntax but not the semantics specifications.

So you can't do this

<h2>   <p>  </h2>  </p>

But you can do this

<h2>   <p>  </p>  </h2>

You also can’t do this

<h2>   <h3>   </h2>   </h3>

But you can do this

<h2>   <h4>  </h4>  </h2>

It was not meant to have anything to do with semantics like nesting Headings.

If it was not worded properly to convey this - it should be fixed as an erratum for WCAG2x - and in WCAG3

It is preferred to semantically nest them — but the SC was not meant for that. It was meant to prevent breaking parsers in AT (back when AT directly parsed the html of pages. This is actually pretty much not done now so no longer that critical.)

gregg

——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

On Sep 25, 2022, at 1:58 PM, Eric Eggert @.***> wrote:

How does ”If you use a markup language, you need to nest your elements according to the rules of that markup language” technology-specific?

If you use XML, XML rules apply. If you use HTML (living standard), its rules apply.

That 4.1.1 applies to markup languages is in the SC, since 2.0, https://www.w3.org/TR/WCAG20/#ensure-compat-parses quote:

In content implemented using markup languages

In the inverse, arguing that some XML rules are meant when that is not part of the success criterion makes the interpretation technology-specific. Of course, you can argue that HTML (living standard) is not meant to be a markup language according to WCAG. Which makes the SC non-applicable to HTML (living standard) documents.

Maximize compatibility with current and future user agents, including assistive technologies.

Is WCAG 2.2 sufficiently future proof if success criteria for markup languages assume that HTML 5 will never be replaced by something else?

It does not. First, HTML5 has been replaced with HTML (living standard) in 2018. Second, I have no idea what you even want to say with this. The web is great at keeping content backwards compatible. HTML5 documents are interpreted perfectly by HTML (living standard) browsers. Of course, there is no guarantee for this. Basically, the Guideline wants you to not use proprietary technology that is not widely used on the market and not accessible. If you use a technology that is well established, it meets the guideline, IMO.

— Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/2525#issuecomment-1257278481, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNGDXQGVSYDBIQGBU7S5E3WAC4IVANCNFSM5ZRXNZ6A. You are receiving this because you were mentioned.

yatil commented 1 year ago

Again, where is that interpretation in WCAG? “elements are nested according to their specifications”. The HTML specification says “it is not allowed to nest an h4 in an h2”.

Do words in WCAG have any meaning?

I’m happy for the WG to change 4.1.1 to match the intent (tho it would not improve accessibility at all, compared to actually following HTML which DOES improve accessibility) or get rid of it entirely. If the words of the SC does not match what testers are supposed to test and implementers are supposed to implement, the SC is super useless and distracting.

yatil commented 1 year ago

And of course we cannot defer this to the Understanding as the understanding documents are still not normative.

GreggVan commented 1 year ago

This issue keeps popping up so let me try to put is all down in one email - so that it can be used to respond to this if and as it comes up in the future. It is understandable that confusion arises from this — but careful reading - including the Understanding WCAG 2.0 document will bring one back to the accurate interpretation of the SC.

FIRST - ON THE ROLE OF THE UNDERSTANDING WCAG 2.X DOCUMENT

The understanding document is not normative — it cannot change the meaning of WCAG or extend or contract it

But it can help you do understand what WCAG means and meant when it was written. So indeed you can rely on it to understand the SC. (Just as notes in a standard can help to understand the normative language even if the notes themselves are non-normative) Since Understanding WCAG 2.0 was written contemporaneously with WCAG 2.0 - it can accurately represent the thinking of the group when it wrote and adopted WCAG 2.0.

NEXT - ON THE PROPER READING AND SCOPE OF 4.1.1

Christophe (and others from the original working group) have posted numerous times — as have I - that WCAG 4.1.1 which reads

Success Criterion 4.1.1 Parsing In content implemented using markup languages, elements have complete start and end tags, elements are nested according to their specifications, elements do not contain duplicate attributes, and any IDs are unique, except where the specifications allow these features.

refers ONLY to syntax nesting - so that the content can be parsed. This is also evidenced by the name of the success criterion, the other clauses in it, and the Understanding WCAG 2.0.

The UNDERSTAND WCAG DOCUMENT reads (emphasis added) Intent

The intent of this Success Criterion is to ensure that user agents, including assistive technologies, can accurately interpret and parse content. If the content cannot be parsed into a data structure, then different user agents may present it differently or be completely unable to parse it. Some user agents use "repair techniques" to render poorly coded content.

Since repair techniques vary among user agents, authors cannot assume that content will be accurately parsed into a data structure or that it will be rendered correctly by specialized user agents, including assistive technologies, unless the content is created according to the rules defined in the formal grammar for that technology. In markup languages, errors in element and attribute syntax and failure to provide properly nested start/end tags lead to errors that prevent user agents from parsing the content reliably. Therefore, the Success Criterion requires that the content can be parsed using only the rules of the formal grammar.

NOTE The concept of "well formed" is close to what is required here. However, exact parsing requirements vary amongst markup languages, and most non XML-based languages do not explicitly define requirements for well formedness. Therefore, it was necessary to be more explicit in the success criterion in order to be generally applicable to markup languages. Because the term "well formed" is only defined in XML, and (because end tags are sometimes optional) valid HTML does not require well formed code, the term is not used in this success criterion.

With the exception of one success criterion ( 1.4.4: Resize Text https://www.w3.org/WAI/WCAG21/Understanding/resize-text, which specifically mentions that the effect specified by the success criterion must be achieved without relying on an assistive technology) authors can meet the success criteria with content that assumes use of an assistive technology (or access features in use agents) by the user, where such assistive technologies (or access features in user agents) exist and are available to the user.

The intent of 4.1.1 was ONLY to assure that content can be accurately interpreted and parsed — so that a data structure for it can be constructed.

Nesting headings in any order (as long as its start and end tags are properly nested for each heading ) will not prevent a data structure from being created, nor the content from being parsed. So this is not covered by this SC.

It is unfortunate that we did not at the time realize people would conflate nesting of headings with parsing and data structures or we would have explicitly added a note making it clear that parsing has to do with syntax not semantics. It was not seen at the time - but some people refer to the start and end tags as elements while others refer to the combination of start and end tag together as an element. Hence all the confusion. (Just bad wordsmithing on the part of the Working Group)

The topic of heading nesting did come up, and the group specifically decided not to require it, since it was not something that prevented understanding of or access to content in any significant way. Bad form, but done all the time. And not seen as a serious accessibility issue.

There was also pressure to put strictly conforming to the HTML spec into WCAG. This was also decided to not be an accessibility issue. But parsing was since, in those days, AT looked at the code and built its own data structures — which could break if the content could not be parsed. So we settled on only requiring the parsing part of the HTML spec to be followed. Hence the name of the SC and the limited scope of it.

Please copy this email into the "institutional memory" pages so that this can be captured and this discussion, which keeps popping up, can be put to rest.

People may not agree with the decision of the Working Group (and people in the working group had different opinions) - but we need to stay with what was decided and the intent of the working group when it reached consensus.

With WCAG 3.x — the working group is free to change and to require all aspects of Markup languages to be strictly followed (or additional aspects) — but, as with all provisions, this should only be done if a case can be made that this is necessary for accessibility.

All the best

gregg

——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

On Nov 3, 2022, at 5:25 PM, Eric Eggert @.***> wrote:

And of course we cannot defer this to the Understanding as the understanding documents are still not normative.

— Reply to this email directly, view it on GitHub https://github.com/w3c/wcag/issues/2525#issuecomment-1302818572, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNGDXXZFLAI3WOEGT5ZDRLWGRJV3ANCNFSM5ZRXNZ6A. You are receiving this because you were mentioned.

yatil commented 1 year ago

The intent of this Success Criterion is to ensure that user agents, including assistive technologies, can accurately interpret and parse content.

Wrongly nested content can impact interpretation in HTML The Living Standard. Alternatively, HTML The Living Standard uses consistent error correction for all the “problems” outlined in the SC, so one could also say that it can always accurately interpret and parse content.

The Accessibility Tree of wrongly nested documents according to HTML The Living Standard will almost always result in inaccurately interpreted content for AT.

Again, I don’t care about the Understanding. It cannot change the words and meaning of the SC and WCAG would be easier to teach, understand, and use, if we wouldn’t say “oh yes, we wrote x in the SC, but if you look at the Understanding, you see that we actually meant seventeen other different things”. It’s fairly ridiculous at this point.

These a “Understanding” documents, not “Well, actually” documents.

(Brief note: The email formatting comes across as plain text, no heading, no marking up quotes. This is super hard to parse and, I would appreciate if one could follow 1.3.1 and at least apply some markup/down for easier reading.)

yatil commented 1 year ago

And we will use WCAG 2 for at least another 10 to 15 years (as WCAG 3 is not replacing it any time soon and even then there will be a long gap until we can use it, if ever). The best time to clarify and fix WCAG 2 issues was 5 years ago, the next best time is today. I understand that admitting that something is hard to understand or maybe not relevant is hard when you have worked on it in 2008, but just saying “No, we meant this all along, even if we didn’t put it in the standard” is just not a way to get out of this.

Because if we don’t learn from the mistakes, we are to repeat them again. This would be a disservice to billions of people with disabilities that we fail because we are taking up resources with understanding WCAG instead of people actually fixing things. It’s a tragedy.

The web could be so much more, so much better, if we would push WCAG forward instead of relying on and fighting for interpretations in the past and hiding them in Understanding documents. The lack of ambition and defense of the status quo is unfortunate.

yatil commented 1 year ago

Please copy this email into the "institutional memory" pages so that this can be captured and this discussion, which keeps popping up, can be put to rest.

People may not agree with the decision of the Working Group (and people in the working group had different opinions) - but we need to stay with what was decided and the intent of the working group when it reached consensus.

Sorry, these sentences irked me more than I realized. If a discussion “keeps popping up”, it should be properly resolved, as a normative change to WCAG. Because discussions “keep popping up” because the SC is bad.

I think it is bad form to say “Please take my word as gospel and put it into the ‘institutional memory’ because I know what we meant, and my interpretation is flawless”. There might be no consensus on how to resolve this issue, but I think these discussions, and that they “keep popping up” shows that there is a consensus that it is hard to understand. I think it is generally bad for to claim that the WG has decided something in 2008 or so, and it will be always correct and infallible until the rest of time.

Turns out, the WG is humans, humans are imprecise and a great skill is that we can correct these errors and imprecisions. Let’s do that.

w3c / wcag

Proposal to Rephrase Success Criterion 4.1.1 #2525

Thing

Double negative?

Examples for the Understanding doc