quarto-dev / quarto

Quarto open-source scientific and technical publishing system
https://quarto.org
GNU Affero General Public License v3.0
280 stars 19 forks source link

Heading fragment identifiers fall back to "section" when headings begin with a number #413

Closed amoeba closed 2 months ago

amoeba commented 2 months ago

Normally, Quarto produces output where each heading has a fragment anchor that lets you link to specific sections on a page (e.g., https://quarto.org/docs/get-started/hello/jupyter.html#overview). This is really useful. The fragment identifier appears to be a slugified version of the heading's text. However, I noticed that when headings start with a number, the identifier instead starts with "section" (repeated headings become "section-1", "section-2", etc). See an example below:

This heading:

## 1.0.0

becomes:

<section id="section" class="level2">
  <h2 class="anchored" data-anchor-id="section">
    1.0.0
    <a class="anchorjs-link " aria-label="Anchor" data-anchorjs-icon="" href="#section" style="font: 1em / 1 anchorjs-icons; margin-left: 0.1875em; padding-right: 0.1875em; padding-left: 0.1875em;"></a>
  </h2>
</section>

While this retains the ability to link to specific headings, it means that the link will not stably point to the same heading when new headings are added. i.e., section will always point to the first heading. Would it be possible to still generate an identifier from the slugified heading when it starts with a number? I know a few problems crop up when HTML identifiers start with numbers (see MDN) but thought I'd file an issue to ask. Thanks in advance for taking a look at this.

For context, I'll note this came up in https://github.com/ibis-project/ibis/issues/8624 which was addressed with a custom Lua filter in https://github.com/ibis-project/ibis/pull/8941.

cscheid commented 2 months ago

This is coming directly from Pandoc:

$ pandoc -f markdown -t html
## 1.0.0
^D
<h2 id="section">1.0.0</h2>

With that said, you can override this easily:

$ pandoc -f markdown -t html
## 1.0.0 {#something-else}
^D
<h2 id="something-else">1.0.0</h2>
amoeba commented 2 months ago

Ah, neat. Thanks for the tip (and fast response). That seems to let me do something like ## 1.0.0 {#1.0.0} and get out 1.0.0 as an id.

cderv commented 2 months ago

I noticed that when headings start with a number, the identifier instead starts with "section" (repeated headings become "section-1", "section-2", etc).

To be complete on pandoc's behavior, this is due to their auto_idenfiers extension and its algorithm https://pandoc.org/MANUAL.html#extension-auto_identifiers

  • Remove all formatting, links, etc.
  • Remove all footnotes.
  • Remove all non-alphanumeric characters, except underscores, hyphens, and periods.
  • Replace all spaces and newlines with hyphens.
  • Convert all alphabetic characters to lowercase.
  • Remove everything up to the first letter (identifiers may not begin with a number or punctuation mark).
  • If nothing is left after this, use the identifier section.
amoeba commented 2 months ago

Thanks @cderv.