Open eliminmax opened 8 months ago
Thanks for reporting, I may look at it tomorrow when get some free time.
And if you can make a PR, feel free to commit it.
Your test cases are very useful. I'll try to fix the issues this weekend.
Thanks! I was working on writing an awk
script to add the heading ids to the output of cmark-gfm
, and I wanted to make sure to handle it right. Turns out the regexp to match all invalid characters is very complex, and in the regex dialect GNU's awk
implementation uses, it's nearly 10 thousand characters long. I found a GitHub repository which includes a computer-generated JavaScript regexp to match all invalid characters in heading names. I created a python script based on that, to generate a series of AWK gsub
statements for my script, splitting it into a bunch of smaller regexp patterns, but it requires the non-standard \uHH
escape sequence added in the latest version of GNU awk
, so it's not portable across awk
versions, let alone vim. In case my script is still helpful, I've uploaded it as a gist here.
Please update the plugin to the newest version and try again, it should can handle your cases now. 🤝
Thank you for making such a great plugin.
I created a markdown file (available here) designed to see how GitHub generates heading IDs in different cases ranging from common (like headings containing non-
[a-z]
letters like the Germanß
, Arabicا
, and Chinese猫
, to weird cases with numbers at the end of headings.Several of the headings generated by this plugin when I run
:GenTocGFM
in that file are different than the ones generated by GitHub.Most of the issues had to do with headings with numbers at the end, though the Arabic
ا
was incorrectly deleted, as was a trailing underscore.Click here to see what this plugin generates for my test file, with notes where it got it wrong.
```markdown * [test.md](#testmd) * [Same Level Same Name](#same-level-same-name) * [Same Level Same Name](#same-level-same-name-1) * [Different Level Same Name](#different-level-same-name) * [Different Level Same Name](#different-level-same-name-1) * [Same Name Differing Caps](#same-name-differing-caps) * [SAME NAME DIFFERING CAPS](#same-name-differing-caps-1) * [same name differing caps](#same-name-differing-caps-2) * [Same Name( )different-Non-»letter° chars](#same-name---different-non-letter-chars) * [Same Name &^$ different Non letter chars](#same-name--different-non-letter-chars) * [Same Name but One Has Code](#same-name-but-one-has-code) * [Same Name `but` One `Has Code`](#same-name-but-one-has-code-1) * [Ending Number Trickery](#ending-number-trickery) * [Ending Number Trickery](#ending-number-trickery-1) * [Ending Number Trickery 1](#ending-number-trickery-1) * [Ending Number Trickery](#ending-number-trickery-2) * [Ending Number Trickery 2](#ending-number-trickery-2) * [Other Ending Number Trickery 1](#other-ending-number-trickery-1) * [Other Ending Number Trickery](#other-ending-number-trickery) * [Other Ending Number Trickery](#other-ending-number-trickery-1) * [Final Ending Number Trickery](#final-ending-number-trickery) * [Final Ending Number Trickery](#final-ending-number-trickery-1) * [Final Ending Number Trickery 1](#final-ending-number-trickery-1) * [Final Ending Number Trickery 1 1](#final-ending-number-trickery-1-1) * [Final Ending Number Trickery 1 1](#final-ending-number-trickery-1-1-1) * [Underscored_heading](#underscored_heading) * [Multiple__underscores](#multiple__underscores) * [\_Leading_underscore](#_leading_underscore) * [Trailing_underscore\_](#trailing_underscore) * [Heading with non-`[a-z]` letters like ß, ا, and 猫](#heading-with-non-a-z-letters-like-ß--and-猫) * [Heading with a Chinese punctuation mark (specifically '】')](#heading-with-a-chinese-punctuation-mark-specifically-) # test.md ## Same Level Same Name ## Same Level Same Name ## Different Level Same Name ### Different Level Same Name ## Same Name Differing Caps ## SAME NAME DIFFERING CAPS ## same name differing caps ## Same Name( )different-Non-»letter° chars ## Same Name &^$ different Non letter chars ## Same Name but One Has Code ## Same Name `but` One `Has Code` ## Ending Number Trickery ## Ending Number Trickery ## Ending Number Trickery 1 ## Ending Number Trickery ## Ending Number Trickery 2 ## Other Ending Number Trickery 1 ## Other Ending Number Trickery ## Other Ending Number Trickery ## Final Ending Number Trickery ## Final Ending Number Trickery ## Final Ending Number Trickery 1 ## Final Ending Number Trickery 1 1 ## Final Ending Number Trickery 1 1 ## Underscored_heading ## Multiple__underscores ## \_Leading_underscore ## Trailing_underscore\_ ## Heading with non-`[a-z]` letters like ß, ا, and 猫 ## Heading with a Chinese punctuation mark (specifically '】') ```