Closed tatchi closed 1 year ago
Ah, I just looked at the failures in CI. There are two issues:
Fun
module and write our implementations, or pull in https://github.com/thierry-martinez/stdcompat (or an equivalent, if available). I'm quite happy with doing the latter.How does that sound?
- To maintain compatibility with OCaml 4.04.2, we'll either need to avoid using the
Fun
module and write our implementations, or pull in https://github.com/thierry-martinez/stdcompat (or an equivalent, if available). I'm quite happy with doing the latter.
I just inlined the function as it was straightforward to do: https://github.com/ocaml/omd/pull/267/commits/0570385a3f8da16ed65929c7b4c983803a5e4d68
- The failures on recent version of OCaml show why this actually does have to be configurable: it breaks all of our specification compliance tests :). I suggest we make this configurable by CLI, by default to having it enabled.
I added an --auto-identifiers
option (inspired by pandoc) in https://github.com/ocaml/omd/pull/267/commits/cb088d1606baf9a9956c7bcd07f2ad8aa1d6bb54. Since it's enabled by default it still break the tests. Should I promote them or disable that option when running the tests?
As for the identifiers, there's actually some rules described in the doc on how to derive them:
The default algorithm used to derive the identifier from the heading text is:
- Remove all formatting, links, etc.
- Remove all footnotes.
- Remove all non-alphanumeric characters, except underscores, hyphens, and periods.
- Replace all spaces and newlines with hyphens.
- Convert all alphabetic characters to lowercase.
- Remove everything up to the first letter (identifiers may not begin with a number or punctuation mark).
- If nothing is left after this, use the identifier section.
I had to switch to an implementation that supports Unicode characters in https://github.com/ocaml/omd/pull/267/commits/eec7388c4dfa0c5b9b426cc09c58adc8498baa6e (using the Uucp
and Uutf
modules). It's not yet perfect as some of the test results are not correct. I'll continue to try to improve that :)
I improved the algorithm slightly; it should be ok for a second review.
There are still some differences compared to the result I get on https://pandoc.org/try/, but pandoc can only generate id's when using markdown
and not commonmark
. So the outputted HTML content (from which the id is derived) might be different in some cases, hence why the result might differ.
The only thing missing I think is:
several headings have the same text; in this case, the first will get an identifier as described above; the second will get the same identifier with -1 appended; the third with -2; and so on.
So for instance in pandoc, the following
### hello
### hello
becomes:
<h3 id="hello">hello</h3>
<h3 id="hello-1">hello</h3>
I guess it means that we'll need to keep a state somewhere with all the ids we've already generated. Not sure how to best implement that and if that's something we want. Would love to have your input on that :)
We'll also have to decide how to handle "empty" id. In pandoc:
If nothing is left after this, use the identifier section.
So
### 33
### 22
###
becomes
<h3 id="section">33</h3>
<h3 id="section-1">22</h3>
<h3 id="section-2"></h3>
cc @koonwen, this would probably fix https://github.com/ocaml/ocaml.org/issues/523!
We'll also have to decide how to handle "empty" id. In pandoc:
I'm fine with following pandoc here.
Closing this in favor of #294. I opened up the new branch purely for the purposes of making merge conflict resolution easier.
Thanks so much, @tatchi!
Fixes: #251
This is my attempt at implementing #251
I didn't add an option to enable/disable that feature. Let me know if I should.