Open kemege opened 8 years ago
You are already able to set the accepted characters yourself. See https://github.com/svbergerem/markdown-it-hashtag#advanced and https://github.com/svbergerem/markdown-it-hashtag/blob/master/test/hashtag.js#L23-L27 for some examples. I'll think about changing the default and keep this issue open until I made my decision.
For reference, unicode has a definition for hashtags here https://unicode.org/reports/tr31/#hashtag_identifiers
It's not easy to read but I think it includes most unicode characters
In JavaScript regular expressions,
\w
only matches[A-Za-z0-9_]
. So it doesn't work well if we put any non-English characters in the tag, like#测试
or#テスト
.Perhaps
\w+
should be replaced by something like(?:\w|[^\u0000-\u007F])+
or[^\u0000-\u0029\u0040\u005b-\u0060\u007b-\u007f]
, as suggested in a StackOverflow Answer?