Markdown: leading Unicode (non-ASCII) whitespace breaks syntax highlighting

Issue Type: Bug

Steps to Reproduce:

Create a Markdown file.
Copy and paste the following:

Highlighted:

[test]() (None)
 [test]() (U+0020, SPACE)
M[test]() (U+004D, LATIN CAPITAL LETTER M)

Not highlighted:

 [test]() (U+2002, EN SPACE)
 [test]() (U+2003, EM SPACE)
　[test]() (U+3000, IDEOGRAPHIC SPACE)

This issue was originally reported as publictheta/vscode-japanese-novel#1 (in Japanese, this extension is by me, but the reporter is not me).

Note

I've not fully investigated, but this could be caused by the inappropriate use of \s and \S in markdown.tmLanguage.json.

From Oniguruma's Documentation (L60-L69):

  \s       whitespace char

           Not Unicode:
             \t, \n, \v, \f, \r, \x20

           Unicode case:
             U+0009, U+000A, U+000B, U+000C, U+000D, U+0085(NEL),
             General_Category -- Line_Separator
                              -- Paragraph_Separator
                              -- Space_Separator

If we may refer to CommonMark, Unicode (non-ASCII) whitespace characters seem to have no special effect except for the delimiter run rule.

VS Code version: Code 1.68.1 (30d9c6cd9483b2cc586687151bcbcd635f373630, 2022-06-14T12:52:13.188Z) OS version: Darwin x64 21.5.0 Restricted Mode: No

Extensions (1)

Extension|Author (truncated)|Version ---|---|--- vscode-language-pack-ja|MS-|1.68.6150906

microsoft / vscode-markdown-tm-grammar

Markdown: leading Unicode (non-ASCII) whitespace breaks syntax highlighting #131

Note