svbergerem / markdown-it-hashtag

hashtag plugin for markdown-it markdown parser
MIT License
24 stars 12 forks source link

Support for non-English characters #2

Open kemege opened 8 years ago

kemege commented 8 years ago

In JavaScript regular expressions, \w only matches [A-Za-z0-9_]. So it doesn't work well if we put any non-English characters in the tag, like #测试 or #テスト.

Perhaps \w+ should be replaced by something like (?:\w|[^\u0000-\u007F])+ or [^\u0000-\u0029\u0040\u005b-\u0060\u007b-\u007f], as suggested in a StackOverflow Answer?

svbergerem commented 8 years ago

You are already able to set the accepted characters yourself. See https://github.com/svbergerem/markdown-it-hashtag#advanced and https://github.com/svbergerem/markdown-it-hashtag/blob/master/test/hashtag.js#L23-L27 for some examples. I'll think about changing the default and keep this issue open until I made my decision.

Powersource commented 5 years ago

For reference, unicode has a definition for hashtags here https://unicode.org/reports/tr31/#hashtag_identifiers
It's not easy to read but I think it includes most unicode characters