jgm / djot

A light markup language
https://djot.net
MIT License
1.63k stars 43 forks source link

Clamp heading level to 6 when outputting HTML #176

Open GarrettAlbright opened 1 year ago

GarrettAlbright commented 1 year ago

Given the following input:

####### Hello!

The Lua code will gladly produce:

<section id="Hello">
<h7>Hello!</h7>
</section>

The JavaScript code does as well. <h8> and so on are also possible with no maximum I can find either in code or in the standard.

Only <h1> through <h6> are valid HTML. It may be useful for Djot to support heading levels higher than 6 when it is used to construct documents that support such, though in the interest of reducing ambiguity in the spec I'd suggest that some maximum is picked and I think that 6 is a reasonable number. All that being said, all this PR does is tweak html.lua such that if a heading level higher than 6 is encountered, only an <h6> is outputted.

Unfortunately I will not be submitting a PR for djot.js as I do not have a setup for transpiling and testing TypeScript, but I suspect a patch for that repo will be as trivial as this one.

vassudanagunta commented 1 year ago

I don't think silently treating it as <h6> is a good idea, because the intent might have been that the heading be a subheading of the prior H6, not a sibling, and this incorrect "correction" would likely go unnoticed if the doc is long (as is likely in this case).

If djot doesn't support warning messages, it would be better to output the text as-is (i.e. ####### Hello!), which is very noticeable. Even leaving it as <h7> would be better, as it renders as a plain bock of text that would appear within the previous H6 heading, and would be reported by any HTML validation as bad HTML. But I think outputting ####### Hello! or <p>####### Hello!<p> would be best.

jgm commented 1 year ago

This raises the question whether we should put the check at the parser level, so that seven #s just don't form a heading node, instead of dealing with it in the renderer. It could be that other formats allow > 6 levels of headings, but I'm not sure that's a reason by itself...

clarfonthey commented 1 year ago

I personally would imagine that anything using more than six levels of headers is probably generating those headers outside of a language like djot and therefore djot should be fine limiting itself in the way HTML has, since most people will be limited by HTML otherwise.

So, whether it's reasonable to have that many headers wouldn't have to be addressed: just whether it's reasonable for someone using djot should, and I think the answer is probably no.

Like maybe some physical books might be doing that, but will you be writing your entire book in djot? Probably not. If it's a website, something like that will be split into multiple pages.

jgm commented 1 year ago

Pandoc will gladly treat ######### hello as a heading, but it will render in HTML as <p class="heading">.

bpj commented 1 year ago

My immediate thought when I saw this was "hey, djot isn't meant to be HTML-centric!" IOW I think the parser should probably produce heading nodes at any level since someone's output format might support that. I'm not saying that it's likely but HTML's cutoff at six is nonetheless arbitrary; some formats cut off at a lower level (e.g. Perl Pod at four) and I think that in principle the door should be left open for higher levels.

The/an HTML renderer is another matter: it should do something when encountering a heading higher than 6. I think I'm most in favor of something like <p class="high-heading heading-7"> on the assumption that the author may have intended the heading and may want to style it, but if it was unintended a CSS rule like .high-heading { background-color: red; } will make it easy to spot. Another possibility is to render sections > 6 as possibly nested definition lists, again with classes like <dl class="heading-level-7"> and <dt class="heading heading-7"> to allow styling both of intentional cases and for detection of unintentional cases. Of course programmatic HTML validation might also detect these tag-class combinations, and a renderer might have an option like --high-heading=p|dl|warn|error.

uvtc commented 1 year ago

I like the idea of djot producing <p class="heading-level-7">hey</p>. If the writer isn't seeing the output they expected from ####### hey, they can always check the html source output.

marrus-sh commented 1 year ago

WAI-ARIA provides an aria-level attribute for indicating heading levels (among other things): https://www.w3.org/TR/wai-aria/#aria-level

<div role="heading" aria-level="7"> would be semantic and not too hard to match with CSS.