kramdown / parser-gfm

kramdown-parser-gfm provides a kramdown parser for the GFM dialect of Markdown
Other
55 stars 14 forks source link

Specifying a header ID that starts with a number is ignored #26

Open ArthurZey opened 3 years ago

ArthurZey commented 3 years ago

Please forgive me if this is intended behavior, perhaps following a spec somewhere, but it seems that kramdown will autogenerate IDs that start with a number, but it will not respect manually specifying IDs. I don't see my usecase disclaimed under Specifying a Header ID. I'm cross-posting from https://github.com/gettalong/kramdown/issues/711, where @gettalong suggested that this was expected behavior using the kramdown binary, but I'm still not 100% sure why, particularly because of the automatic generation of header IDs and how that differs from the behavior using the binary.

Autogeneration of ID for headings starting with numbers works

For example:

## 2021-03-30

yields

<h2 id="2021-03-30">2021-03-30</h2>

as expected. (The example remains true for header text that is purely numeric, such as ## 123.)

(Although, I didn't understand how to reconcile that with the HTML Converter documentation for "Automatic Generation of Header IDs", which suggested that I should expect a different result.)

Specifying a custom header ID works for headings starting with [a-z]

And

## Foo
{: #bar}

and

## Foo {#bar}

both yield

<h2 id="bar">Foo</h2>

also as expected.

Specifying a custom header ID starting with [0-9] does not work

So here's where I get the unexpected behavior:

## 2021-03-30: Foo {#2021-03-30}

yields

<h2 id="2021-03-30-foo-2021-03-30">2021-03-30: Foo {#2021-03-30}</h2>

and

## 2021-03-30: Foo
{: #2021-03-30}

yields

<h2 id="2021-03-30-foo">2021-03-30: Foo</h2>

Of course, I was expecting the following in both of the two examples immediately above:

<h2 id="2021-03-30">2021-03-30: Foo</h2>

I'm gathering that kramdown is ignoring manually specified header IDs when they start with numbers (since even adding [a-z] characters later in the ID doesn't help).

Context

I'm using GitHub Pages, and my Gemfile.lock file specifies the following under github-pages (213):

kramdown (= 2.3.0)
kramdown-parser-gfm (= 1.1.0)

And under jekyll (3.9.0):

kramdown (>= 1.17, < 3)

and then also

    kramdown (2.3.0)
      rexml
    kramdown-parser-gfm (1.1.0)
      kramdown (~> 2.0)

My _config.yml includes the following lines:

kramdown:
  smart_quotes: ["apos", "apos", "quot", "quot"]
  typographic_symbols: { hellip: ... , mdash: --- , ndash: -- , laquo: "<<" , raquo: ">>" , laquo_space: "<< " , raquo_space: " >>" }
  auto_id_stripping: true

I noticed that I didn't have a markdown: kramdown specified, but adding that in didn't change the behavior, so I'm thinking that maybe that becomes implicit if there's a kramdown object specified.

FWIW, I've also confirmed the same behavior on this online kramdown editor/renderer, so it doesn't seem to be specific to a GitHub Pages or Jekyll implementation.

It's a lot easier to see effects immediately if you include a TOC block:

* TOC
{:toc}
gettalong commented 3 years ago

I'm cross-posting from gettalong/kramdown#711, where @gettalong suggested that this was expected behavior using the kramdown binary, but I'm still not 100% sure why,

What I meant was: It is the expected behaviour when using the kramdown parser and the HTML converter. Since you are using the GFM parser and HTML converter, you might get different results.

ArthurZey commented 3 years ago

Indeed; I'm just saying I don't know why it's the expected behavior. What is the reason behind this behavior? It seems it must have been intentional. And then what's the reason for the difference in behavior between the binary and the GFM parser?

gettalong commented 3 years ago

The binary just uses whatever parser/converter pair you specify. If you specify GFM+HTML, you will get the same output as on Jekyll.

The difference in behaviour might be because of differences between the kramdown syntax and the GFM syntax. Also see https://github.com/kramdown/parser-gfm/blob/f1012bebbe97358ed8a1f5d16e750d3567a0d1a4/lib/kramdown/parser/gfm.rb#L111

ArthurZey commented 3 years ago

When I say "reason", I'm asking about the human decision to have this behavior, not what code causes it to manifest this way, though I will definitely look more carefully through the file you linked to; thank you!

Why did a human being decide that an ID can't start with a number?

gettalong commented 3 years ago

This is due to the way HTML4 specified the ID attribute. Also of interest: https://github.com/gettalong/kramdown/commit/90954bc