Use CommonMark definition of punctuation charset

Something goes wrong...

Given

**你好世界。**Hello world!

mistletoe's output is

<p><strong>你好世界。</strong>Hello world!</p>

while it should be (according to CommonMark's dingus):

<p>**你好世界。**Hello world!</p>

Why this happens?

Root cause (core_tokens.py line 9-11):

punctuation = {'!', '"', '#', '$', '%', '&', '\'', '(', ')', '*', '+', ',',
               '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\',
               ']', '^', '_', '`', '{', '|', '}', '~'}

。 (Chinese fullstop) is not in the set, therefore ** is regconised as a right-flanking delimiter run (which it shouldn't be).

We should use a broader punctuation charset, including CJK punctuations (and more) as well.

CommonMark Spec:

See "Files changed" tab, I left some comments on the modification I made.

Signed-off-by: jaredliw jaredliw@gmail.com

miyuchina / mistletoe

Use CommonMark definition of punctuation charset #141

Something goes wrong...

Why this happens?