mjbvz / vscode-fenced-code-block-grammar-injection-example

Example of injecting a new grammar into VSCode's builtin markdown syntax highlighting for fenced code blocks
MIT License
98 stars 26 forks source link

the language `begin` pattern is matching deep within in the code block (false positives) #6

Closed glebec closed 6 years ago

glebec commented 6 years ago

An example like this (where ◆ = backtick for rendering purposes):

◆◆◆
this is some text
la la la
re
more text
◆◆◆

Matches the re for e.g. the ReasonML language which uses an identical injection file as this example repo. The desired behavior would only be to match language identifiers appearing after the triple backticks on the same line:

◆◆◆re
some text
more text
etc.
◆◆◆

The problem would seem to be that the regex in the begin property for this example appears to be too lax.

https://github.com/mjbvz/vscode-fenced-code-block-grammar-injection-example/blob/dd6961fce89362b623ad83637a6050f89bb92f32/syntaxes/codeblock.json#L11

I tried playing with it a bit, but I was stymied as patterns which I thought would apply (matching against triple backticks, for example) failed to match appropriately. Accordingly I have a few questions:

  1. How are the begin and end patterns actually applied in VSCode? Are they applied to entire files, or just to the contents of code fences, or something else?
  2. What precise regex syntax does this example repo use? It doesn't appear to be JS regex considering the presence of the \G pattern.

Let me know if I am off-base, but since many language extensions are using this repo as a template without carefully analyzing how exactly it works, it seems worthwhile to make the pattern as bulletproof as is reasonably possible. I am seeing a lot of false positives.

(This issue is essentially a superclass of #5.)

glebec commented 6 years ago

Examples of patterns I tried which do not seem to work (again, substitute ◆ with backtick):

◆◆◆(re|reason|reasonml)(\\s+[^`~]*)?$
\\G(re|reason|reasonml)(\\s+[^`~]*)?$
(?<=$\\W*)(re|reason|reasonml)(\\s+[^`~]*)?$

…and some other stuff too but which was flawed a priori, I was just experimenting to try and figure out the scope of the pattern matching.

glebec commented 6 years ago

Ah, I was close with the thought to use a lookbehind! Thank you @mjbvz.

mjbvz commented 6 years ago

I pushed a fix for this that uses a lookbehind as you note.

It's not perfect since it will still match:

```
`js
bla bla bla
```

To avoid that problem, I believe you need to inject the grammar into the top level markdown grammar instead of into the fenced code block rule. This is a bit more complicated since you have to handle tokenizing the fenced code block start and end markers too. Here it is for reference:

{
    "fileTypes": [],
    "injectionSelector": "L:text.html.markdown",
    "patterns": [
        {
            "include": "#superjs-code-block"
        }
    ],
    "repository": {
        "superjs-code-block": {
            "begin": "(^|\\G)(\\s*)(\\`{3,}|~{3,})\\s*(?i:(superjs)(\\s+[^`~]*)?$)",
            "name": "markup.fenced_code.block.markdown",
            "end": "(^|\\G)(\\2|\\s{0,3})(\\3)\\s*$",
            "beginCaptures": {
                "3": {
                    "name": "punctuation.definition.markdown"
                },
                "5": {
                    "name": "fenced_code.block.language"
                },
                "6": {
                    "name": "fenced_code.block.language.attributes"
                }
            },
            "endCaptures": {
                "3": {
                    "name": "punctuation.definition.markdown"
                }
            },
            "patterns": [
                {
                    "begin": "(^|\\G)(\\s*)(.*)",
                    "while": "(^|\\G)(?!\\s*([`~]{3,})\\s*$)",
                    "contentName": "meta.embedded.block.superjs",
                    "patterns": [
                        {
                            "include": "source.js"
                        }
                    ]
                }
            ]
        }
    },
    "scopeName": "markdown.superjs.codeblock"
}