jonschlinkert / markdown-toc

API and CLI for generating a markdown TOC (table of contents) for a README or any markdown files. Uses Remarkable to parse markdown. Used by NASA/openmct, Prisma, Joi, Mocha, Sass, Prettier, Orbit DB, FormatJS, Raneto, hapijs/code, webpack-flow, docusaurus, release-it, ts-loader, json-server, reactfire, bunyan, husky, react-easy-state, react-snap, chakra-ui, carbon, alfresco, repolinter, Assemble, Verb, and thousands of other projects.
https://github.com/jonschlinkert
MIT License
1.65k stars 707 forks source link

markdown-toc support for accented characters broken #145

Open redochka opened 5 years ago

redochka commented 5 years ago

markdown-toc is generating :

  * [Considération](#consideration)

instead of

  * [Considération](#considération)

for this markdown:

## Considération
AeonFr commented 5 years ago

This is an inconsistency between how the implementation of the markdown slugify function works between this library and whatever library the user is trying to use.

For the time being, this little accent fix would be a great improvement since it would add compatibility with two very common markdown implementations:

For reference, this is the sluggin function in marked:

Slugger.prototype.slug = function(value) {
  var slug = value
    .toLowerCase()
    .trim()
    .replace(/[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,./:;<=>?@[\]^`{|}~]/g, '')
    .replace(/\s/g, '-');

  if (this.seen.hasOwnProperty(slug)) {
    var originalSlug = slug;
    do {
      this.seen[originalSlug]++;
      slug = originalSlug + '-' + this.seen[originalSlug];
    } while (this.seen.hasOwnProperty(slug));
  }
  this.seen[slug] = 0;

  return slug;
};

I think implementing this function would be a great start.

Feature request: Maybe if in the future some inconsistencies between sluggify implementations arrive, we could delegate the responsibility to create the headings to this library instead of depending on another library, maybe as an optional function.

Edit: It looks that @mootari wrote a script to support that very same feature already in an issue of this same repo, although I'm not sure how to integrate it with the current code (I'm very newby at node)

mootari commented 5 years ago

@AeonFr The snippet you linked to was part of an assemble pipeline. You can find the full context here: https://github.com/mootari/Documentation/blob/aae6df5755fffa4b4f48caed53dfa7b8acf24270/generator/lib/plugins/render-markdown/index.js#L76

Be aware that markdown-toc already has an option to specify a custom slugify callback.

AeonFr commented 5 years ago

Thanks very much, actually it makes perfect sense to be able to use your own slugify function.

I leave my code for further reference:

function slugify(value) {
    return value
        .toLowerCase()
        .trim()
        .replace(/[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,./:;<=>?@[\]^`{|}~]/g, '')
        .replace(/\s/g, '-');
}

toc('# Some markdown', { slugify });

You may notice the accented characters are rendered "URL encoded", but Chrome is not making any problem with that and it works nevertheless

Generated by markdown-toc:
li><a href="#c%C3%B3mo-hacer-un-back-up-del-sitio-y-recuperar-una-versi%C3%B3n-antigua-del-sitio">Cómo hacer un back-up del sitio y recuperar una versión antigua del sitio?</a></li>

Generated by marked:
<h3 id="cómo-hacer-un-back-up-del-sitio-y-recuperar-una-versión-antigua-del-sitio">Cómo hacer un back-up del sitio y recuperar una versión antigua del sitio?</h3>

Again, this just works and it's a clean solution since it doesn't depend on the markdown implementation. I still believe it would be best if the library supported accents by default but I'm still glad I got this working.

Thanks

mootari commented 5 years ago

I still believe it would be best if the library supported accents by default but I'm still glad I got this working.

In my experience it is rather unusual for slugs to contain anything but unreserved characters. Let me also point you to https://en.wikipedia.org/wiki/Clean_URL#Slug.