mixu / markdown-styles

Markdown to static HTML generator and multiple CSS themes for Markdown
1.85k stars 250 forks source link

Can not resolve the Chinese title! #49

Closed weihong1028 closed 7 years ago

weihong1028 commented 7 years ago

Chinese title will be resolved as "-"

mixu commented 7 years ago

I think the issue is here: https://github.com/mixu/markdown-stream-utils/blob/master/lib/annotate-md-headings.js

token.id = token.text.toLowerCase().replace(/[^\w]+/g, '-');

I'm guessing the Chinese title matches /[^\w]+/g and hence becomes -. I think I'd need a piece of code that instead takes any non-word characters and converts them into base64 codes that correspond to the relevant (raw) characters since it looks like that's what the Chinese wikipedia does when it converts non-alphanumerics to a header id. Any chance you could write a PR to the repo I mentioned above with a fix (and ideally a test)?

edit: example from Wikipedia: https://zh.wikipedia.org/wiki/1949%E5%B9%B4%E5%AE%89%E5%B7%B4%E6%89%98%E5%9C%B0%E9%9C%87#.E5.9C.B0.E8.B4.A8

weihong1028 commented 7 years ago

I think you are right. The Chinese title will becomes - when it matches /[^\w]+/g. And I just use encodeURIComponent() to convert them. token.id = encodeURIComponent(token.text.toLowerCase()); Now it works.

Another file should be fix in https://github.com/mixu/markdown-styles/blob/master/lib/convert-md.js.

var id = this.options.headerPrefix + encodeURIComponent (raw.toLowerCase());

And I find another issue, the same name heading in the toc cannot auto add -{num}

weihong1028 commented 7 years ago

Hello, I want to display the link raw like ../../#基本理解,not like ../../#%20%JD%90. So I don't want to useencodeURIComponent to convert, what is your opinion? token.id = token.text.toLowerCase();

mixu commented 7 years ago

yeah I think token.id = token.text.toLowerCase().replace(/\s+/g, '-'); seems fine - according to http://stackoverflow.com/questions/70579/what-are-valid-values-for-the-id-attribute-in-html and https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/id there may be compatibility issues with non alphanumeric strings, but I think in practice that won't be an issue and the only "explicitly bad" string is whitespace

it seems slightly better than token.id = token.text.trim().toLowerCase(); because String.prototype.trim() only removes spaces at the beginning or end of the string

mixu commented 7 years ago

thanks for looking into this! could you also add a test in https://github.com/mixu/markdown-styles/blob/master/test/integration.test.js with chinese headings (preferably with the same heading used multiple times) so that if I end up making any additional changes at some later time, the test will catch any potential problems with Chinese characters?

weihong1028 commented 7 years ago

Nice!token.id = token.text.toLowerCase().replace(/\s+/g, '-'); is better. And I will try to add a test in integration.test.js.

weihong1028 commented 7 years ago

Hello, I think should use .trim() removes spaces at the beginning or end of the string first, and then try .replace(/\s+/g, '-'). Is it right?

mixu commented 7 years ago

sgtm!

weihong1028 commented 7 years ago

I try to add a test in https://github.com/mixu/markdown-styles/blob/master/test/integration.test.js,and it can pass the test.

it('reads and renders metadata stored in a file with Chinese characters and the same heading used multiple times', function(done) {
      var dir = fixture.dir({
        'foo.md': [
          'title: 你好 世界',
          'author: Anonymous',
          '----',
          '# 世界因我而不同',
          '## 世界因我而不同',
          '### 世界因我而不同',
          '# 一个世界,一个梦想',
          '## 一个世界,一个梦想',
          '### 世界因我而不同'
        ].join('\n')
      });

      var out = fixture.dirname();

      mds.render({
        input: dir,
        output: out,
        layout: layoutDir
      }, function() {
        assert.equal(fs.readFileSync(out + '/foo.html', 'utf8'), [
            '"你好 世界" by Anonymous',
            '<h1 id="世界因我而不同">世界因我而不同</h1>',
            '<h2 id="世界因我而不同-1">世界因我而不同</h2>',
            '<h3 id="世界因我而不同-2">世界因我而不同</h3>',
            '<h1 id="一个世界,一个梦想">一个世界,一个梦想</h1>',
            '<h2 id="一个世界,一个梦想-1">一个世界,一个梦想</h2>',
            '<h3 id="世界因我而不同-3">世界因我而不同</h3>\n'
          ].join('\n'));
        done();
      });
    });

But the generate link in nav-list is wrong

mixu commented 7 years ago

great, could you update the PRs and I'll see if I can fix that last issue tomorrow?

weihong1028 commented 7 years ago

Thank you, I have already update the PRs.

weihong1028 commented 7 years ago

replace(/\s+/g, '-') is wrong convert character .

"angular.directive".replace(/\s+/g, '-')
"angular.directive"
"angular directive".replace(/\s+/g, '-')
"angular-directive"
mixu commented 7 years ago

cool thanks! I did a bunch of cleanup, squashed a few commits and added a test for the TOC generation. It should all be fixed if you install v3.1.9 (e.g. npm install -g markdown-styles@latest).

mixu commented 7 years ago

thanks again, let me know if you run into any further issues!

weihong1028 commented 7 years ago

So great! It works! Thank you very much.