whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.12k stars 2.67k forks source link

Suggest adding a warning about outline algorithm #83

Closed stevefaulkner closed 2 years ago

stevefaulkner commented 9 years ago

Currently the HTML standard does not provide any advice in regards to the outline algorithm not being implemented, This has lead to some developers believing that the outline algorithm has an effect in browsers and assitive technology which it does not. THis can lead to developers using markup patterns that don't convey document structure. Suggest adding a warning, for example this is the warning in the W3C HTML spec

There are currently no known implementations of the outline algorithm in graphical browsers or assistive technology user agents, although the algorithm is implemented in other software such as conformance checkers. Therefore the outline algorithm cannot be relied upon to convey document structure to users. Authors are advised to use heading rank (h1-h6) to convey document structure.

othermaciej commented 6 years ago

I think the talk of how ARIA fits into this is getting a little muddied. I want to clarify a few key points:

  1. The algorithm being proposed is not an outline algorithm. It replaces some of the intended purpose of the outline algorithm, but it does not produce an outline as output. It's better thought of as an "effective heading level" algorithm. It takes an element in the DOM as input, and produces its effective heading level as output. Heading levels are an output, not an input, and no heading is affected by the level of any other heading. So whether it considers ARIA does not make much difference. The ARIA override is applied after this algorithm produces its result.

  2. In the amended proposal, the effective heading level would be exposed directly to CSS via the :heading() pseudo-class. For implementation complexity and layering reasons, ARIA can't play into this CSS selector. So something has to define the effective heading level pre-ARIA-override.

  3. For purposes of what is exposed to assistive technology, accessibility mappings take the output of the effective heading level algorithm as the default. However, if the same element has aria-level on it, then assistive technologies get that level instead.

  4. This is actually the normal way ARIA works for everything else. Many elements in HTML have a default accessibility mapping. However, ARIA markup always takes precedence at the accessibility layer, but does not directly affect the rendering, behavior or non-accessibility semantics. This is due to ARIA's origin as a solution for "div soup" style JavaScript frameworks that do not use semantic markup. So, for example, at the HTML and CSS levels, only <input type=checkbox> is a checkbox, and it always is one. But at the assistive technology mapping level, that is the default, and ARIA takes precedence. For assistive technology mapping purposes, <div role=checkbox> is a checkbox, and <input type=checkbox role=banner> is not.

  5. Why would an author intentionally make the assistive technology mapping of an element conflict with its base semantics? The fact is, most of the time, they should not. It's better to use correct semantic markup in the first place and be consistent at all layers. But the purpose of ARIA is to give authors the override switch for the cases where it gives the best outcome.

In conclusion, I think a layered approach is correct. HTML should define an "effective heading level" algorithm as proposed by @annevk. That would get exposed directly to the proposed :heading() pseudo-class. Then, HTML Accessibility Mappings would take that as the default, and let ARIA be an override at the accessibility mapping layer.

In fact, HTML AAM is already kind of set up this way. HTML AAM links to Core AAM's language on conflicts to say that ARIA role and heading level always overrides this for purposes of accessibility. But in 4.4 Element Role Mappings, it says h1-h6 elements default to a heading level of the element's outline depth, as defined in current HTML.

Unfortunately, this is widely ignored, because the way "outline depth" is defined is not practical to implement. It always gives the wrong answer for elements not in a section; it says that an h3 outside a section has an outline depth of 1, for example.

If HTML provides a new definition of effective heading level", then it will be very easy to fit it into the architecture of existing accessibility specs. HTML AAM simply needs to replace its reference to "outline depth" with one to "effective heading level", and all the layers will hook together correctly.

Trying to consider ARIA when computing the effective heading level in the first place will not be a good fit. It will cause ARIA to get applied twice, at two different layers, and would not match how ARIA works for other elements.

(Sorry for the long-winded explanation. This is a bit complicated and subtle to explain. I hope the explanation was helpful for some people.)

stevefaulkner commented 6 years ago

Thanks @othermaciej

But in 4.4 Element Role Mappings, it says h1-h6 elements default to a heading level of the element's outline depth, as defined in current HTML.

That's a bug in the spec given its lack of implementation have filed an issue to correct this.

Thinking on this more:

The factors that I consider need further investigation to reach an agreement on how the above are mapped:

As previously suggested, looking at data for usage of sectioning elements and hgroup should inform these considerations.

annevk commented 6 years ago

Note that if you want to skip a sectioning content element whose h1 element descendants' nearest ancestor sectioning content element is not that sectioning content element you are again looking at an algorithm that is complex as you cannot just go through the parent chain anymore.

I agree we should look at data, but I don't think that's going to be a solution we can use.

asurkov commented 6 years ago

I like the proposal. It feels strange though that h1 is treated a special way. It would be more consistent if all h1/h2/h3/etc would earn +1 for each nesting section, or neither of them. The latter one looks cleaner and easier with me.

The old way: <h1>heading#1</h1> <h2>heading#2</h2>

and the new way:

<hgroup>heading#1<hgroup> <section> <hgroup>heading#2</hgroup> </section>

both look good.

Is the point of adjustable h1 is to use it instead of hgroup? If so this moves me back wondering whether HTML:h element should be revisited.

othermaciej commented 6 years ago

@asurkov The rendering rules in the HTML standard act the same way - h2-h6 are fixed, while h1 is sized differently depending on nesting level. I think the accessibility mapping should match the rendering unless there is some hugely compelling reason otherwise.

othermaciej commented 6 years ago

I agree with @annevk . It's really important for this algorithm to remain a walk up the ancestor chain instead of a full document traversal.

domenic commented 6 years ago

The algorithm being proposed is not an outline algorithm. It replaces some of the intended purpose of the outline algorithm, but it does not produce an outline as output.

I'm not sure it was clear, but I think the intent of https://github.com/whatwg/html/issues/83#issuecomment-359871505 (specifically the paragraph starting "We’d also define an algorithm that finds...") is to keep having an outline algorithm in the spec. Personally I think that is important at the semantic level. That would be in addition to the heading-level determination algorithm.

The question of whether that outline should contain role="heading" elements is an interesting one. I am not sure, but I lean toward yes. Ideally I think the "semantic outline" specified in HTML and the "AT outline" are the same; that seems like it is most likely to steer authors down the correct path. The alternative is to say that HTML has an outline algorithm that only takes actual headings into account, whereas HTML-AAM has a different outline algorithm that reflects what ATs will implement. But that seems confusing to me.

othermaciej commented 6 years ago

Oh, sure. If we're talking about the remaining outline algorithm and not the effective heading level algorithm, then it doesn't necessarily have the same constraints. However, it seems like the two algorithms should produce consistent results. I am not sure if it would be inconsistent for role="heading" to be considered by one or not the other. Maybe not, as long as the presence of a heading never affects the heading levels of other headings.

I am not sure anyone will implement the outline algorithm at all though, at least in a mainstream browser or AT. My assumption is that the proposed effective heading level algorithm is pretty likely to be widely implemented, but the new outline algorithm probably won't be implemented any more than the current one.

domenic commented 6 years ago

My understanding is that the proposed outline algorithm is already listed in many ATs, in the form of the ability to list all of a document's headings and navigate between them. I have not tested myself, and so welcome corrections.

The current outline algorithm is also implemented by a variety of tools. At least some authors I know use the output of those tools to guide their development. This is currently a bad thing, because creating an outline that looks good according to those tools can lead to create an outline with a bad experience for AT users. But if we aligned the outline algorithm with what AT users experience, then it could become a good thing.

othermaciej commented 6 years ago

I don't know about other AT, but I do know about VoiceOver and how it integrates with WebKit. VoiceOver does not make use of the current outline algorithm or anything remotely similar, nor would it benefit from having such a thing.

As I understand it, the outline is a list of sections with possible nesting (i.e. a "forest" or potentially multi-rooted tree). This is the output of the outline algorithm. A potential described use is to dynamically create an interactive table of contents that lets you jump to a section. It doesn't actually relate to headers at all, except that outline depth of heading content is defined in terms of the output of the outline algorithm, and each section has an associated header.

VoiceOver has commands for "Find next heading", "Find previous heading", "Find the next heading of the same level" and "Find the previous heading of the same level". It tells you the level that is currently on. That's it. All of these commands can be implemented with a way to get the heading level for a heading, and a way to traverse forward or backward in document order until a heading is found. None of them need a forest of sections. Jumping to sections is not exposed. If the heading for a section was somewhere in the middle, there isn't a way to jump to the start of the section instead of the heading.

Given all that, an algorithm that produces a forest of sections is neither necessary nor helpful.

It could be that a different outline algorithm would produce more useful output. Anne suggests that the new outline algorithm would operate on headings, not sections:

We’d also define an algorithm that finds all the h1-h6 and hgroup elements in a document so they can be presented as an outline.

I guess this could be useful, since it is about headings, not sections. If it produces a flat list in document order, then it would match what VoiceOver needs, but it would be kind of trivial and not really an "outline". If it produces a forest of headings, where tree level may not match heading level, then it may or may not be useful. We'd have to decide whether "Find the next heading of the same level" is about the heading level or the outline level. I'm not sure it is helpful to have two different notions of heading level, so I don't think an algorithm like that would be useful.

It's hard to evaluate its usefulness further without more specifics about what it does.

The effective heading level algorithm is much more obviously useful, so I've focused my review on that.

LJWatson commented 6 years ago

@othermaciej wrote: <I don't know about other AT, but I do know about VoiceOver and how it integrates with WebKit. VoiceOver does not make use of the current outline algorithm or anything remotely similar, nor would it benefit from having such a thing.<

This is the case for all screen readers that I'm aware of. @Domenic is right that screen readers have the ability to navigate between headings, but this is based on the headings themselves, not the outline algorithm.

jakearchibald commented 6 years ago

@othermaciej

I think the accessibility mapping should match the rendering unless there is some hugely compelling reason otherwise.

The best use case I can think of is a template like:

<article>
  <h1>Reasons Event Horizon is actually quite good</h1>
  <section>
    {{{articleContent}}}
  </section>
</article>

…where articleContent is HTML generated from Markdown.

Now, if I use Markdown's ## Headings syntax, it produces the correct heading structure.

I'm undecided whether this is worth it. But, "number of ancestor sectioning elements + default heading level - 1" is simpler than I thought it'd be.

domenic commented 6 years ago

Right, it's not based on the current outline algorithm, because the current outline algorithm is not usable for that purpose. I'm trying to say we should create an outline algorithm that is usable for that purpose, by virtue of it correctly listing all the headings. This is effectively what we see today in several implementations of today's (flawed) outline algorithm; see e.g. https://checker.html5.org/?showoutline=yes&doc=https%3A%2F%2Fstreams.spec.whatwg.org and notice how the heading-based outline (similar to @annevk's proposal) and the "structural" outline (based on the current HTML Standard) are basically the same. (The differences would be larger if the document used <section> elements, but it does not.)

People saying that the outline algorithm is not necessary or helpful seem mostly to be focusing on the fact that it points to sections + their headings, instead of directly to their headings. But I think this is not a very important difference; we can change it to point to headings and the result will be almost exactly the same, except in some cases of un-heading-ified sections (which I am happy to omit). That is, sections and headings are usually 1:1 in well-structured documents.

I think it's still useful to call the resulting list of headings (which can indeed be displayed in a nested fashion, as shown by the above link) an outline, and call the algorithm that collects them an outline algorithm, and use it in the same semantic fashion as the current outline algorithm.

othermaciej commented 6 years ago

Getting a flat list of headings (each with an associated level) is potentially useful. It is kind of simple (basically document order traversal + "is this a heading" test) but at least it won't conflict with other info in a confusing way, and it is obvious how to do previous/next operations without implementing it literally. I have no objection to it being called "outline". That name is a bit tainted in some circles, but it can be redeemed.

A forest of headings with their own hierarchy that does not necessarily match heading level would be less useful. But if that's not what is intended, there is no point complaining about it.

othermaciej commented 6 years ago

@jakearchibald

Interesting example. I believe a rule of "number of ancestor sectioning elements + default heading level - 1" for the heading level exposed to accessibility would make it mismatch the rendering in your example. A ## heading (level 2) would render at level 2 style, but be exposed to accessibility as a level 3 heading. I think making the accessibility view diverge from the rendering is a bigger issue than the structure of this document being logically correct in only one of these views.

In theory we could change rendering for h2 nested in a section too, but I fear that ship has sailed.

I believe the correct solution to this problem is for the Markdown --> HTML converter to lower the heading level by one. In fact, that's exactly what I did in the converter that's used to produce the HTML versions of the WHATWG's policy documents, and it's super simple to do at conversion time:

 def lower_headers(policy_markdown):
     return re.sub(r'^#', '##', policy_markdown, flags=re.MULTILINE)
jakearchibald commented 6 years ago

@othermaciej I made my example confusing, sorry. I'd write my markdown so I'd use # for top level headings, then ## for next level etc etc, without having to consider the context of the content on the page.

This means I can use some of the same HTML as a "preview" on the article homepage, without having to shift all the heading levels.

I really don't have strong feelings about it, but it's a practical benefit.

The alternative is finding a markdown html generator that uses sections.

annevk commented 6 years ago

@LJWatson I thought I read somewhere that some AT does present the user with a list of headings, rather than just ways to navigate between them (as is the case in VoiceOver). If that's not the case, than the main use case for such an algorithm would be server-side scripts, to generate a "table of contents" or some such. I'm not entirely sure to what extent we should consider them as we also don't define many of the other features such scripts use. The main use of defining the algorithm then would be as a conformance requirement, to ensure that no heading levels from low to high are skipped.

As for considering h2-h6:

xi commented 6 years ago

I don't know about other AT, but I do know about VoiceOver and how it integrates with WebKit. VoiceOver does not make use of the current outline algorithm or anything remotely similar, nor would it benefit from having such a thing.

I do not use a screen reader myself, but a quick image search turned out some images showing a nested list of headings in NVDA:

NVDA heading list

So I think there is benefit of specing how this nested list is generated from a flat list. That should be relatively simple.

I am not sure yet how this algorithm should deal with skipped heading levels: Should it just ignore them (like the current outline algorithm) or should it add empty nodes for them (like the current heading outline in the validator)?

I believe the correct solution to this problem is for the Markdown --> HTML converter to lower the heading level by one.

I feel this should go into ATAG, not sure where exactly. Also note that there is a post about this on the commonmark forum.

stevefaulkner commented 6 years ago

The following popular browsers expose the semantics of h1 to h6 elements to assistive technology (e.g. screen readers), by reference to the numeric in their TagNames (h1 = heading level 1 and so on):

JAWS for example provides the following ways to interact with headings:

stevefaulkner commented 6 years ago

Advice we provide to developers is "do not skip heading levels" <h1> → <h3> is bad <h1> → <h2> → <h3> is good If the algorithm produces skipped levels because headingless sectioning elements are taken into account in the calculation of a headings level it codifies a practice we tell developers to avoid.

LJWatson commented 6 years ago

@AnneVK wrote: < I thought I read somewhere that some AT does present the user with a list of headings, rather than just ways to navigate between them (as is the case in VoiceOver).<

They do, including VO. As @stevefaulkner notes, the screen readers take the heading information exposed by the browser, and present it in different ways.

In VO for example, hit the VO modifier u, use left/right to locate the list of headings, use up/down to navigate the list of headings, and enter to select (navigate to) one.

sideshowbarker commented 6 years ago

I added a use counter to the W3C HTML checker to collect data on some particular cases relevant to recent discussions here. The data so far from a total of 223,507 requests looks like this:

case # found / 223,507 fraction
h1 multiple found 23434 0.104847
h1 multiple with section ancestor found 6291 0.028147
h1 multiple with article ancestor found 4636 0.020742
h1 multiple with aside ancestor found 825 0.003691
h1 multiple with nav ancestor found 917 0.004103

That is, 10.5% of documents checked have multiple h1 elements, 2.8% have at least one h1 among multiple h1 elements nested within a section, 2.1% have at least one nested within an article.

One case that seemed like it might be interesting to examine more is the case where a document has multiple h1 elements not nested within section/article/aside/nav. So I have the checker set up to log the URL for each instance it finds of that case. I dropped a sample of that log output to here:

https://gist.githubusercontent.com/sideshowbarker/8284404/raw/9220fb7b5a1befbabc04dc27943a5255cdfecc3e/h1.log

I have thousands more if anybody wants to look further — but after browsing through that sample, they turn out to mostly not be interesting at all; instead they seem to mostly just be documents that use a lot of nested div (instead of section/article/aside/nav) but that for some reason just use h1 at arbitrary levels of nesting throughout (rather than using h2-h6).

annevk commented 6 years ago

Given that it seems this doesn't have much compatibility impact and there's some agreement on what I proposed, I went ahead and detailed it in #3499. I still need to go through all existing examples and element definitions to make sure they all make sense in this heading-focused world, but the proposal itself should now be clear.

bkardell commented 6 years ago

I think that for the most part I like this proposal. I have three 'concerns'.. The first two are mostly about impact and messaging.

The first is just that as steve said earlier it'd be effectively building in a pattern that would create something we historically advise authors to explicitly not to write manually. That could use a solid explanation of why everyone is ok flipping on that (and confirmation that they are) or I think that could go badly.

Second is that as it stands, for those people who spent a lot of time and effort using sectioning and made good headings that didn't rely on the phantom 'document outline' that never shipped, the beautiful AT heading relationships, the conceptual 'outline' that they built will get less beautiful. For example, given something like:

  <article>
    <h1>Apple varieties</h1>
    <p>The apple is the pomaceous fruit of the apple tree...</p>

    <section>
      <h2>Red Delicious</h2>
      <p>These bright red apples are the most common found in many supermarkets...</p>
    </section>

    <section>
      <h2>Granny Smith</h2>
      <p>These juicy, green apples make a great filling for apple pies...</p>
      <div role="heading" class="note" aria-level="3">The origin of the name</div>
      <p>...blah...</p>
    </section>
  </article>

Today, this has a pretty decent AT tree and comminicates sensible heading relationships. While there is no 'outline', it's totally all 'there' for someone to make sense of with the navigational tools available in the AT... If you used them (but I'll come back to that). Some even use this their to their advantage to autogenerate TOCs and things so that everyone benefits from this, and the relationships look like this:

But, the result of this approach to their AT trees is that it gets less sensible by the measures we have previously laid out. In other words, it's been stressed over the years "it's quite important you get this right" and "here's the critieria for what it means to be 'right'". Some people, maybe not that many, but some, for whatever reason and sometimes at great cost (sometimes by force of law) went back and tried really hard to meet those criteria, and now by those same measures it seems like we are undoing some of that. That's really the thing that worries me just a bit, because it feels like it could kind of be... disheartening.

I want to stress that I don't actually know how important this is, I'm just pointint it out as a thing to consider. It definitely should have some strong messaging about why I think if that's the route. I've said before that I think that numbers are less important than relationships and that that is confusing right now because the only relationships anyone has to make sense of the world is via numbers that implied structure and not the other way round. My experience is limited, but from the folks I've talked to, depite what we encourage, I don't think people 'use' the tools in readers to think about structures that much precisely because they have always been so broken 99% of the time and they were based on really 'flat' models. Maybe the best thing to do is to just admit that and provide a good plan going forward, and this might be it.

My third thing is that none of the proposed solutions (even mine so far) that seem 'doable' have dealt well IMO with the reality that Jake mentions and I think that's potential fatal to real success. Very often a site (or sections of a site) can be very structured, but what is frequently just referred to as content is authored in a way a lot more like it is read - either with a decent rich text editor or as simple markdown or something. Speaking "by volume" this is a huge huge huge amount of the web for the simple reason that tools like this dramatically increase the potential number of authors by specifically being not intensely structured, more like the traditional 'flat' documents. It's entirely possible to author "good" content with good/sensible headings that isn't really 'compatible' with inclusion in your pretty structured page.

So, I have what I think is a rather small tweak to suggest: What if we added to this proposal a non-reflected, non-dynamic policy attribute or something (forgive my potential butchering/overloading of terms here as I try to explain) so that authors could simply identify upfront that this particular container will hold 'flat' content so that we could maybe choose a slightly different strategy there. In these, we'd simply use Anne's logic to calculate the effective level of the container and then add that to the specified tag heading level. Thus you could have something like

  <section>
    <h1>One</h1>
    <section>
      <h1>Two</h1>
      <article heading-policy="flat">
        <!-- this is simply included 'flat' content -->
        <h1>Apple varieties</h1>
        <p>The apple is the pomaceous fruit of the apple tree...</p>

          <h2>Red Delicious</h2>
          <p>These bright red apples are the most common found in many supermarkets...</p>

          <h2>Granny Smith</h2>
          <p>These juicy, green apples make a great filling for apple pies...</p>
      </article>
      </section>
    </section>
  </section>

and the result would be like

<section>
    <h1 aria-level="2">One</h1>
    <section>
      <h1 aria-level="3">Two</h1>
      <article heading-policy="flat">
        <h1 role="heading" aria-level="4">Apple varieties</h1>
        <p>The apple is the pomaceous fruit of the apple tree...</p>

          <h2 role="heading" aria-level="5">Red Delicious</h2>
          <p>These bright red apples are the most common found in many supermarkets...</p>

          <h2 role="heading" aria-level="5">Granny Smith</h2>
          <p>These juicy, green apples make a great filling for apple pies...</p>
      </article>
    </section>
  </section>

I have hacked a mod of Anne's polyfill which does this, it seems like a fairly minor tweak? Again, I'm suggesting this would only be at parse time and not live, that is, adding or removing the attribute (or however it might be done) wouldn't affect things. That makes me think this would make more sense as a tag (since that would better explain that relationship)? Maybe it would make sense to separately float a custom element and see if people used it/correctly? I'm unsure, I guess I just think this very very common use case deserves some discussion/thought.

jakearchibald commented 6 years ago

it'd be effectively building in a pattern that would create something we historically advise authors to explicitly not to write manually. That could use a solid explanation of why everyone is ok flipping on that

"The bug was fixed" seems good enough. I mean, we've done that a lot in the past.

<article>
  <h1>Apple varieties</h1>

In this case the article is redundant. From the spec:

When the main content of the page (i.e. excluding footers, headers, navigation blocks, and sidebars) is all one single self-contained composition, that content may be marked with an article, but it is technically redundant in that case (since it's self-evident that the page is a single composition, as it is a single document).

I'd rather we bumped the level of all headings according to the number of ancestor sectioning elements than introduce something extra like heading-policy.

bkardell commented 6 years ago

"The bug was fixed" seems good enough. I mean, we've done that a lot in the past.

I'm not sure which bug this really refers to. Effectively, as I understand it: A whole bunch of documents that would have been previously evaluated as "this contains some bad practices that create a poor experience and here's why" still would be experienced the same way, but the bar just moved... Again, I think it's entirely plausible that there's a perfectly acceptable explanation for this reversal of position. I'd just like to see it articulated by a11y folks because a lot of investment has gone into explaining why this is undesirable and training people to identify by noticing that experience and then opening bugs to remediate this. I feel like for sure this is going to have a not-insignificant impact for me personally at work which will require explanation/retraining. Again, not an argument against it - just asking for someone to articulate well.

In this case the article is redundant.

This was just intended as a fragment demonstrating both use of section/h1 and 'flat' content, not an entire document (thus no body, etc), so it's not.

I'd rather we bumped the level of all headings according to the number of ancestor sectioning elements than introduce something extra like heading-policy.

That seems... simple and fine? I thought this was decided against earlier in the thread because of some issue and was offering a compromise based on the distinction between 'flat' stuff and 'structured' stuff, but... I like the simplicity of that actually.

jakearchibald commented 6 years ago

I'm not sure which bug this really refers to. Effectively, as I understand it: A whole bunch of documents that would have been previously evaluated as "this contains some bad practices that create a poor experience and here's why" still would be experienced the same way, but the bar just moved

The bug was "browsers do not consider sectioning elements when exposing the outline to AT via heading levels", and that's what we're aiming to fix here.

Many years ago, we used to say "Don't concatenate strings using +, instead create an array of strings then .join('') them. It's faster in IE". Anyway, new versions of IE fixed the issue, and market share shifted to browsers without the bug. In these modern browsers, + is faster than the array method. Advice to developers was "The bug has been fixed. Just use +".

This was just intended as a fragment demonstrating both use of section/h1 and 'flat' content, not an entire document (thus no body, etc), so it's not.

Does that mean you'd have multiple <h1>s in the page then? Isn't that bad practice?

annevk commented 6 years ago

The first is just that as steve said earlier it'd be effectively building in a pattern that would create something we historically advise authors to explicitly not to write manually.

This is not true. The proposed change actually makes it a conformance error to skip a heading level. (Currently this is not flagged by the validator.) That alone should have a positive effect on the kind of documents that get written.

My rationale for this change overall is as follows:

My rationale for :heading/:heading(level) is:

There is some compatibility risk, and some potential for confusion, but overall I think the advantages far outweigh the drawbacks of only having h1-h6 and rather useless sectioning content elements.

bkardell commented 6 years ago

The bug was "browsers do not consider sectioning elements when exposing the outline to AT via heading levels", and that's what we're aiming to fix here.

Right now, if i have no sectioning elements and make explicit markup that skips levels. We would say, "that is bad because it creates an experience where users cannot understand - a 3 should be preceded by a 2 because this is how you understand structure using numbers (a mental outline)". No 'bug' is being fixed with relation to that markup. Nothing about experience changes there, simply a thing that my QA/a11y depts might identify as 'bad' yesterday, suddenly isn't. From my understanding, at least. I'd like to have an explanation to hand them, that's it.

Does that mean you'd have multiple <h1>s in the page then? Isn't that bad practice?

Sure, but isn't the whole current proposal actually based on sectioning plus h1s? I'm showing what I thought is your actual use case. You do this right? Then you want to include some flat content and have the relationships continue to make sense. That's it.

bkardell commented 6 years ago

This is not true. The proposed change actually makes it a conformance error to skip a heading level.

The "why" of not skipping heading levels that I have always heard and explained has to do with the AT, not the markup. There, even if you don't in markup - this creates skipped heading levels in AT, right? Again, I want to be super clear that I am not against this change. I pondered hard how to reply because I know the 'concerns' sound kind of negative, but they're mostly just about asking for a lucid explanation from a11y folk who support this on this seeming dissonance and messaging this well.

jakearchibald commented 6 years ago

simply a thing that my QA/a11y depts might identify as 'bad' yesterday, suddenly isn't. From my understanding, at least. I'd like to have an explanation to hand them, that's it.

Did you see my stringArray.join('') example? Doesn't this answer your question?

Sure, but isn't the whole current proposal actually based on sectioning plus h1s? I'm showing what I thought is your actual use case.

I was taking from your "as it stands" example. You were giving an example of something that's good practice today, but it doesn't seem to be good practice today.

I don't think I'm helping here, so I'll step back.

bkardell commented 6 years ago

@annevk has just explained to me a misunderstanding I was operating under with regard to @stevefaulkner's comment

Advice we provide to developers is "do not skip heading levels" <h1> → <h3> is bad <h1> → <h2> → <h3> is good If the algorithm produces skipped levels because headingless sectioning elements are taken into account in the calculation of a headings level it codifies a practice we tell developers to avoid.

While the algorithm can, in theory, create skipped levels they wouldn't be any more 'acceptable' than they were before and the ways that would create them would now be flaggable as conformance errors. Sorry for any noise/confusion on that item, that's enough for me.

stevefaulkner commented 6 years ago

Note: the w3c nu markup checker has warnings (and has had for some time) for sections missing headings and use of multiple h1's etc. for example: https://validator.w3.org/nu/?showsource=yes&showoutline=yes&doc=https%3A%2F%2Fs.codepen.io%2Fstevef%2Fdebug%2FNyJKJz

annevk commented 6 years ago

Given the requirement in the proposed change that you cannot skip heading levels:

Each heading following another heading lead in document headings must have a heading level that is less, equal, or 1 greater than lead's heading level.

I don't think we need to require that sectioning content has a heading, unless there's a concern with that on its own, but then that's probably best discussed in a new issue.

Not using multiple headings with heading level 1 is something we could consider adding, though currently the specification says this is fine and even includes an example that does that. I think that therefore that's also best discussed separately.

alastc commented 6 years ago

Not sure I'm following entirely, but there is an exception I'd like to check.

A fairly common pattern is to have one or more headings above/before the H1.

E.g.

etc. Such as http://www.bbc.co.uk/news/live/uk-43202018 image

This is a desirable pattern from an accessibility point of view, is that still valid with this new approach?

annevk commented 6 years ago

@alastc currently

The first heading within document headings must have a heading level of 1.

forbids that, but it seems reasonable to change that to require at least one heading to have a heading level of 1 and not require a particular position. Thanks for raising that.

bkardell commented 6 years ago

@jakearchibald mentioned:

I'd rather we bumped the level of all headings according to the number of ancestor sectioning elements than introduce something extra like heading-policy.

Sorry to add more noise on this,but for clarity: Is this on the table or off the table? My reading is probably incorrect, but I understood that there were reasons to not want to do that? The reason I am asking is that @jakearchibald's basic use case is pretty much the reality of most CMSes I know of - different authored content is going to keep on using 'flat' h1...h6 and I think that ideally we'd like them to "make sense" like this?

rehierl commented 6 years ago

@annevk Allow me to recap the main points of your proposal:

  1. The heading level of h1 elements depends on its placement: 1 + the number of ancestor sectioning content elements.
  2. The heading level of an h2-h6 element is constant (fixed to the number value).
  3. The lower a heading's heading level is, the more important the heading is.
  4. "Each heading following another heading lead in document headings must have a heading level that is less, equal, or 1 greater than lead's heading level."

I am having difficulties to understand (4): Could you please rephrase (4) so that it is more clear what it means.

<body>
  <h1> A </h1>            toc-1
  <section>               =====
    <h1> B </h1>          1. A
    <section>             1.1. B
      <h1> C </h1>        1.1.1. C
      <h2> D </h2>        1.2. D
    </section>
  </section>
</body>

If I understand your approach correctly, then the heading level of heading C is 3. And because the heading level of the h2 element is constant, its heading level is still 2. So, is "toc-1" an accurate representation of the intended table-of-contents listing (because the h2 element, is now more important than the h1 element), or is it supposed to be something else (e.g. undefined because non-conformant)?

annevk commented 6 years ago

@bkardell my impression is that there's too much content out there that uses h2-h6 without regard for their sectioning content element ancestors so adjusting them would do too much damage to existing content.

@rehierl 4 means that you cannot skip heading levels when the heading level increases relative to a previous heading. And toc-1 is indeed accurate (and conforming).

LJWatson commented 6 years ago

@annevk < my impression is that there's too much content out there that uses h2-h6 without regard for their sectioning content element ancestors so adjusting them would do too much damage to existing content.<

It would, but a question worth asking, is what impact that damage would actually have. To some extent this idea is based on the assumption that most/many pages have a useful heading hierarchy, and that introducing these changes would therefore break it.

My own experience is that heading hierarchies are almost always broken as it is. So perhaps we should be asking whether the damage this algorithm might cause is better or worse, than the existing status quo?

My (entirely unscientific) hunch is that it would not make things worse. If that assertion can be backed up with evidence, then the question is whether it is worth exchanging one kind of broken for another, as a means of getting everyone to the point where things get easier for authors and better for users?

annevk commented 6 years ago

@LJWatson the other reason for avoiding adjusting h2-h6 is that we have styling in place for h1 (based on nesting depth) and we cannot add equivalent styling for h2-h6 (and we also cannot change it for h1). It also seems like an easier pitch to say that folks should just use h1 and sectioning content elements.

alastc commented 6 years ago

It's worth examining, from the summary I'd expect these sites to be fine:

Sites that would not provide a good experience would be:

I'm trying to think of any inbetween structures, for example:

<body>
  <banner>
     <h2>Nav heading</h2>
  </banner>
  <main>  
     <h1>Page heading</h1>
    <article>
      <h2> C </h2> 
      <h3> D </h3>
    </article>
  </main>
</body>

I think that would be fine, but if the structure was main > section > h1, that would be an H2? That seems like an easy mistake to make.

It feels like you really have to go one way or the other, mixing multiple H1s with H2-6s is where it would go wrong. I'm guessing someone already came up with the idea of detecting H2-6 and switching to classic heading-levels?

annevk commented 6 years ago

@alastc switching is too expensive (and could also lead to very weird experiences on slow loading pages).

bkardell commented 6 years ago

I'm guessing someone already came up with the idea of detecting H2-6 and switching to classic heading-levels?

Yes, but because this is difficult and expensive and simultaneously also a very real thing that will happen with pretty much any existing tool or CMS (because visual editors, markdown, etc think of text in a more traditional 'flat' way - that's where our original ideas about them come from in fact) I offered that a kind of indicator that let you know 'this element contains that kind of stuff' would be a way to potentially bridge the gap in a way that seems achievable both from an implementation standpoint and an 'authors could actually accomplish this' standpoint. Effectively, the current proposal says that h1's level becomes the section depth - this would allow that all the levels inside an element with such an indicator become section depth + (tag level -1). It seems pretty easy to make as a custom element or polyfill (in fact, I'm playing with a version of it in a project right now) so I'm not trying to push that it has to be a part of this, but it does seem like without it it will still be very hard for most authors to use common tools to create good headings.

alastc commented 6 years ago

Fair enough.

In general then safe authoring advice for this proposal would be:

Then, when there is reasonable UA support you could switch to H1s only and use sectioning for levels.

Have I understood that correctly?

annevk commented 6 years ago

Yes (or use the polyfill if requiring script is acceptable for your site).

rehierl commented 6 years ago

Even though I am only a member of the general public, I need to ask you to do the following: If only for a moment, try to take a look at the bigger picture.

tldr - The proposed heading-level algorithm may reflect "reality", but it won't solve the core problem: Figure out a way to teach computers how to read the content an author associates with a heading (aka. sections). The core concept of the current algorithm (i.e. sections) isn't what made it fail. The attempt to make it reflect reality is what broke the algorithm we have! So use the proposed design to complement (i.e. not replace) it. And, in the long run, fix the existing design, because that is what will drag us out of the mess we are in. - /tldr

With regards to ...

(2015-12-27, @domenic) - Quotes some twitter discussion: ... We have higher standards ... (W3C's) kind of self-contradictory patchwork ... update the spec to reflect reality.

(2016-10-23, @fititnt) - A requests to: decide the best path, explain it, and be consistent.

(2018-03-01, @bkardell) - A reminder that: it's been stressed over the years "it's quite important you get this right" and "here's the criteria for what it means to be 'right'".

(2018-01-26, @othermaciej) - The outline (as currently defined) is a list of (nested) sections (i.e. a "forest") ... an algorithm that produces a forest of sections is neither necessary nor helpful ... the new outline algorithm (should) operate on headings, not sections.

The reason for "a list of sections"

Take a closer look at WHATWG's outline algorithm in order to figure out what the reason for the "a list of sections" definition really is. (Note that I fully agree here: That part (i.e. forest) does not make sense and should be changed). However, if you'd scroll down to "When entering a heading content element", then you might notice this:

1) The only case in which the "current section" can end up with having no heading is, if that section was declared by an element of sectioning content or sectioning root.

2) If the "current section" has no heading, and if the first element of heading content is entered inside of such a section, then that heading element is reused as the section's heading.

3) The first "Otherwise" block is what I'd call a "performance shortcut" which causes the loop beneath it to always create a subsection (see "append it to candidate section"). So the first "Otherwise" block is what is relevant with regards to the above definition, not the loop beneath it.

fragment 4-1      toc 4-1   fragment 4-2       toc 4-2
============      =======   ============       =======
<body>            1. A      <body>             1. Untitled
  <h1> A </h1>    2. B        <section>        1.1. A
  <h1> B </h1>                  <h1> A </h1>   2. B
</body>                       </section>
                              <h1> B </h1>
                            </body>

4-1) If the current heading being entered has a rank that is greater or equal to the rank of the current section's heading, then a sibling section is created (i.e. a forest).

4-2) If the current section has an implied heading, then a sibling section is created (i.e. a forest). Note that this case will only be triggered under certain circumstances. However, if you look close enough at toc 4-2, you could spot an inconsistency error with regards to the "the first element of heading content" definition. Other than that, I'll ignore fragment 4-2 because of it being "bad-practice".

Now, please focus on 4-1: (1) The reason why the current algorithm ends up with "a list of sections" is due to a heading/rank-based perspective. But, instead of actually taking a look at what went wrong, you (2) suggest to let the "new" outline algorithm operate on headings.

I don't know about you, but I find that rather ironical because a heading-based perspective is largely responsible for our situation. And that is why I have to disagree: "It" is about sections, not headings!

With regards to fixing what we already have ...

I can only cite what @fititnt and @bkardell wrote (see above): Get it right!

Use Graph Theory (i.e. not statistics) to prove on a formal level that, whatever you come up with next, is consistent. One way or another, I am convinced that Graph Theory will win. After all, ... Earth isn't the center of the universe, it never was! ... (Note the critical switch in perspective!)

rehierl commented 6 years ago

@bkardell - I am in the process of spending some time with some "it" ("some time", what an understatement that is). Here is what I think:

You are right, the outline is a higher level version of a document's node tree. And if you understand "sectioning nodes" as a class of nodes which, more or less, tell an algorithm where exactly a section begins and where it ends (i.e. heading elements, sectioning content elements, sectioning root elements), then that tree of sections (aka. outline) is defined by those sectioning nodes.

However, you could even take it one step further: Some of those sectioning nodes (i.e. the sectioning root elements) can even be understood to not just define sections within a document's node tree, but also to define sections within the document's outline. After all, a tree of sections is in its core just another node tree. So what you actually have is this: a document, an outline, and an outline of an outline.

prlbr commented 6 years ago

Does anybody still think that it was a good idea to reuse <h1> instead of a new element <h> for the explicit <section> modell? All of this would be easier and more intuitive, if authors could chose between the traditional heading focused <h1><h6> modell with implicit sections and an explicit <section>/<h> modell. It may not be to late to recognize a mistake and fix it at the root.

rehierl commented 6 years ago

The <h> element ... I have seen the discussions, but I didn't have enough time to really think about it in detail. So here is my current and limited take on that element in order to avoid drifting off in a "war of beliefs":

The reason for "a forest of sections" is the reuse of heading elements. But instead of limiting the "reuse" to what it was supposed to do (i.e. only define a heading), people thought that a heading element should still have a rank under these kind of circumstances. What proof is there that the section of a <section> element even needs an inner or an outer rank?

The problem with the <h> element is that it "looks" too similar to the <hX> elements. My guess is, that people will confuse it with an element that declares a section (i.e. a sectioning node) and thus would try to use it just like all the other <hX> elements. Yes, introduce it to teach the core concept of a rank-less sectioning node. No, don't introduce it as an element that is not a sectioning node.

If used as a sectioning node, then what rank should it have? Should it act as a <h0> element? Or should it act more like a <h7> element instead?

The <h> element is just an option, not the solution itself.

And here is the bummer: In the long run, I'd much prefer the reuse of another element, the title element: If every element of sectioning content or sectioning root has only one top-level section and, as such, represents a "flat thing", then what is the heading of the <body> element? The document's title ... The current definition of the <title> element (IMHO) seems to better fit the intended purpose than the <h> element.