rust-lang / rust-analyzer

A Rust compiler front-end for IDEs
https://rust-analyzer.github.io/
Apache License 2.0
14.09k stars 1.57k forks source link

Highlight rustdoc #16253

Closed zhaouv closed 6 months ago

zhaouv commented 8 months ago

Do Rustaceans want these to be integrated into this extension?
If the answer is 'yes,' I will send a pull request.

image image image
filipwiech commented 8 months ago

IMHO it would be a very welcome feature, maybe it could be configurable so that users can choose the preferred experience. I assume that it's providing semantic tokens for the relevant parts of the documentation comments. :slightly_smiling_face:

One idea though: perhaps the generated tokens shouldn't be exactly the same as the ones for the normal code, but ought to signify that they are coming from the rustdoc (maybe with a relevant prefix or suffix in their name). This way the theme could decide to show the highlighted parts with different colors (for example slightly muted ones) so that they could be distinguished more easily on the first glance from the actual implementation. :+1:

Veykril commented 8 months ago

It will depend on the implementation, especially regarding the syntax highlighting of other languages inside the doc comment. We don't really wanna ship anything unrelated to rust in that regard.

lnicola commented 8 months ago

Also, I think we currently highlight in the second screenshot, which matches rustdoc?

Veykril commented 8 months ago

No we used to, but that implementation caused a bunch of problems so we removed it again.

lnicola commented 8 months ago

What's this, then?

image

Veykril commented 8 months ago

Oh, you mean the embedded rust code. Yes that we do. I thought you meant the markdown parts.

zhaouv commented 8 months ago

The current implementation (there is still room for adjustments): package.json>contributes>grammars

            {
                "injectTo": [
                    "source.rust"
                ],
                "scopeName": "comment.markdown-cell-inject.double-slash-exclamation.LRSW",
                "path": "./syntaxes/double-slash-exclamation.json",
                "embeddedLanguages": {
                    "meta.embedded.block.everywhere.md": "markdown"
                }
            },
            {
                "injectTo": [
                    "source.rust"
                ],
                "scopeName": "comment.markdown-cell-inject.triple-slash.LRSW",
                "path": "./syntaxes/triple-slash.json",
                "embeddedLanguages": {
                    "meta.embedded.block.everywhere.md": "markdown"
                }
            },
            {
                "injectTo": [
                    "source.rust"
                ],
                "scopeName": "comment.markdown-cell-inject.slash-start-exclamation.BR",
                "path": "./syntaxes/slash-start-exclamation.json",
                "embeddedLanguages": {
                    "meta.embedded.block.everywhere.md": "markdown"
                }
            }

triple-slash.json:

{
    "injectionSelector": "L:source.rust -string -comment -meta.embedded.block.everywhere.md",
    "patterns": [
        {
            "include": "#comment-markdown-cell-inject-triple-slash-LRSW"
        }
    ],
    "repository": {
        "comment-markdown-cell-inject-triple-slash-LRSW": {
            "begin": "(^|\\G)\\s*(///) ?",
            "captures": {
                "2": {
                    "name": "comment.punctuation.definition.quote_code.triple-slash.LRSW"
                }
            },
            "name": "comment.quote_code.triple-slash.LRSW",
            "contentName": "meta.embedded.block.everywhere.md",
            "patterns": [
                {
                    "include": "text.html.markdown"
                }
            ],
            "while": "(^|\\G)\\s*(///) ?"
        }
    },
    "scopeName": "comment.markdown-cell-inject.triple-slash.LRSW"
}

slash-start-exclamation.json and double-slash-exclamation is similar
and with settings (user put it into settings.json themselves)

    "editor.tokenColorCustomizations": {
        "[Default Dark+]": {
            "textMateRules": [
                {
                    "scope": "meta.embedded.block.everywhere.md markup.heading, meta.embedded.block.everywhere.md markup.bold",
                    "settings": {
                        "foreground": "#61aa71",
                    }
                },
                {
                    "scope": "meta.embedded.block.everywhere.md punctuation.definition.list.begin.markdown, meta.embedded.block.everywhere.md entity.name.tag",
                    "settings": {
                        "foreground": "#599aa5",
                    }
                },
                {
                    "scope": "meta.embedded.block.everywhere.md entity.other.attribute-name",
                    "settings": {
                        "foreground": "#98bdc4",
                    }
                },
                {
                    "scope": "meta.embedded.block.everywhere.md markup.inline.raw, meta.embedded.block.everywhere.md string",
                    "settings": {
                        "foreground": "#ceca8b",
                    }
                },
                {
                    "scope": "meta.embedded.block.everywhere.md, meta.embedded.block.everywhere.md meta.embedded",
                    "settings": {
                        "foreground": "#9abb87",
                    }
                },
            ]
        }
    },
zhaouv commented 8 months ago

This is a pure static TextMate rule implementation that utilizes a small trick to consume the /// token, enabling the built-in syntax analysis of VCS to correctly highlight Markdown and its fenced code blocks. It still works perfectly without any additional overhead for Rust files without Cargo.toml. Any manually written TypeScript/JavaScript/Rust doesn't require any additional setup.

Veykril commented 8 months ago

We had a textmate injection for this before and that caused a bunch of problems, will need to dig the PRs for that out later to see what the problem was.

lnicola commented 8 months ago

https://github.com/rust-lang/rust-analyzer/pull/15116

zhaouv commented 8 months ago

I have gathered the previous several issues and the syntax removed in the previous two pull requests. The problem with them is that essential syntax was not excluded.

-string -comment

https://github.com/rust-lang/rust-analyzer/pull/15116
https://github.com/rust-lang/rust-analyzer/pull/15102
https://github.com/rust-lang/rust-analyzer/issues/15111
https://github.com/rust-lang/rust-analyzer/issues/15114
https://github.com/rust-lang/rust-analyzer/issues/15091

rules 15116 ```json { "$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json", "scopeName": "source.rustdoc.markdown.injection", "injectionSelector": "L:source.rust", "patterns": [ { "include": "#doc-comment-line" } ], "repository": { "doc-comment-line": { "name": "comment.line.documentation.rust", "begin": "^\\s*//(/|!)", "end": "$", "contentName": "meta.embedded.block.markdown", "patterns": [ { "include": "text.html.markdown" } ] } } } ```
rules 15102 ```json { "$schema": "https://raw.githubusercontent.com/martinring/tmlanguage/master/tmlanguage.json", "scopeName": "source.rustdoc.markdown.injection", "injectionSelector": "L:source.rust", "patterns": [ { "include": "#doc-comment-line" } ], "repository": { "doc-comment-line": { "name": "comment.line.documentation.rust", "begin": "^\\s*//(/|!)", "end": "$", "contentName": "meta.embedded.block.markdown", "patterns": [ { "include": "text.html.markdown" } ] } } } ```
break 15111 ```rs /// bnuuy! pub fn cool(){ assert_eq!(1+1, 2); } /// not bnuuy enum Ponies { Izzy, Sunny, Hitch } /// baz struct Foo { bar: String } /// hello enum E { A, B, C } /// Ping pong #[poise::command(slash_command, ephemeral)] pub async fn ping(ctx: CommandContext<'_>) ->Result<()> { ctx.say(format!( "Pong :3 ({}ms)", (ctx.created_at().time() - Utc::now().time()).num_milliseconds() )) .await?; Ok(()) } ```
break 15114 ```rs /// Doc comments seem to be rendered in normal color,rather than comment color /// ``` /// Comment code blocks similarly aren't colored or formatted differently /// ``` /// Weirdly,VS Code's parentheses matching kicks in, however: ((()) fn main(){ println!("Hello world!"); } ```
break 15091 ```rs /*********************/ let _foo = "/**"; /* //////////////////////////// //////////////////////////// */ let _foo = "*/"; /* /// */ let _foo = "/** */"; ```

Let's take a look at some classic examples of syntax injection.

https://github.com/mjbvz/vscode-comment-tagged-templates

"injectionSelector": "L:source.js -comment -(string - meta.embedded), L:source.jsx -comment -(string - meta.embedded), L:source.js.jsx -comment -(string - meta.embedded), L:source.ts -comment -(string - meta.embedded), L:source.tsx -comment -(string - meta.embedded)",

https://github.com/mjbvz/vscode-jsdoc-markdown-highlighting

"injectionSelector": "L:comment.block.documentation",

(mjbvz: This guy, who works for vscode, is a real expert on this.)

First, you should choose to inject into the smallest possible scope.
When you need to include a larger range, be sure to exclude strings and comments.
And you should add meta.embedded tags to the injected content, as a standard practice.

As for relevant information, it's actually TextMate's documentation. VSCode adopted this design, so no separate references are provided.

https://www.apeth.com/nonblog/stories/textmatebundle.html
https://macromates.com/manual/en/all_pages
https://macromates.com/manual/en/language_grammars#naming-conventions

Final result:

image image image
zhaouv commented 8 months ago

And next step is adjust the behavior of fenced code. There are two ways to do it.

The more elegant way is to create a language called rustdoc.

{
    "name": "rustdoc",
    "patterns": [
        {
            "include": "#fenced_code_block"
        },
        {
            "include": "#markdown"
        }
    ],
    "scopeName": "text.html.markdown.rustdoc",
    "repository": {
        "markdown":{
            "patterns": [
                {
                    "include": "text.html.markdown"
                }
            ]
        },
        "fenced_code_block":{
            "patterns": [
                {
                    "include": "#fenced_code_block_rust"
                },
                {
                    "include": "#fenced_code_block_unknown"
                }
            ]
        },
        "fenced_code_block_rust": {
            "begin": "(^|\\G)(\\s*)(`{3,}|~{3,})\\s*(?i:(rust|not run|not_run)?((\\s+|:|,|\\{|\\?)[^`~]*)?$)",
            "name": "markup.fenced_code.block.markdown",
            "end": "(^|\\G)(\\2|\\s{0,3})(\\3)\\s*$",
            "beginCaptures": {
                "3": {
                    "name": "punctuation.definition.markdown"
                },
                "4": {
                    "name": "fenced_code.block.language.markdown"
                },
                "5": {
                    "name": "fenced_code.block.language.attributes.markdown"
                }
            },
            "endCaptures": {
                "3": {
                    "name": "punctuation.definition.markdown"
                }
            },
            "patterns": [
                {
                    "begin": "(^|\\G)(\\s*)(.*)",
                    "while": "(^|\\G)(?!\\s*([`~]{3,})\\s*$)",
                    "contentName": "meta.embedded.block.rust",
                    "patterns": [
                        {
                            "include": "source.rust"
                        }
                    ]
                }
            ]
        },
        "fenced_code_block_unknown": {
            "begin": "(^|\\G)(\\s*)(`{3,}|~{3,})\\s*(?=([^`~]+)?$)",
            "beginCaptures": {
                "3": {
                    "name": "punctuation.definition.markdown"
                },
                "4": {
                    "name": "fenced_code.block.language"
                }
            },
            "end": "(^|\\G)(\\2|\\s{0,3})(\\3)\\s*$",
            "endCaptures": {
                "3": {
                    "name": "punctuation.definition.markdown"
                }
            },
            "name": "markup.fenced_code.block.markdown"
        }
    }
}

And then. only rust|not run|not_run and block without description will highlight as rust, other will be ignored.

image
zhaouv commented 8 months ago

And also more things can be done:
when you type a Enter at red O, auto fill /// in next line
(by vscode.languages.setLanguageConfiguration('rustdoc', { onEnterRules: [...] })

image
Veykril commented 8 months ago

Alright, seems like you figured out the problems from the previous attempts. I'm happy for us to try this again in that case!

Veykril commented 8 months ago

One question I have though is whether this clashes with semantic tokens as we do emit semantic tokens for rust code embedded in doc comments (and that should keep working). I expect VSCode to prefer the semantic tokens here.

zhaouv commented 8 months ago
image

Current is D,
if add new textmate rules goto B. (rules+settings+semantic is also B)

Rules + settings + no semantic is A. I think it is the best effective.
but if no settings added, it goes C. I think it is bad because it makes reading coffusing.

so the best way is provide a setting that can close emit semantic tokens. (the default value is emit.)
so it can avoid C.
user can close it and add settings to go A

these following somehow do not work, the highlight is still cover to B

    "rust-analyzer.semanticHighlighting.doc.comment.inject.enable": false,
    "rust-analyzer.semanticHighlighting.nonStandardTokens": false,
    "rust-analyzer.semanticHighlighting.operator.enable": false,
    "rust-analyzer.semanticHighlighting.strings.enable": false

settings means

    "editor.tokenColorCustomizations": {
        "[Default Dark+]": {
            "textMateRules": [
                {
                    "scope": "meta.embedded.block.rustdoc markup.heading, meta.embedded.block.rustdoc markup.bold",
                    "settings": {
                        "foreground": "#61aa71",
                    }
                },
                {
                    "scope": "meta.embedded.block.rustdoc punctuation.definition.list.begin.markdown, meta.embedded.block.rustdoc entity.name.tag",
                    "settings": {
                        "foreground": "#599aa5",
                    }
                },
                {
                    "scope": "meta.embedded.block.rustdoc entity.other.attribute-name",
                    "settings": {
                        "foreground": "#98bdc4",
                    }
                },
                {
                    "scope": "meta.embedded.block.rustdoc markup.inline.raw, meta.embedded.block.rustdoc string",
                    "settings": {
                        "foreground": "#ceca8b",
                    }
                },
                {
                    "scope": "meta.embedded.block.rustdoc, meta.embedded.block.rustdoc meta.embedded",
                    "settings": {
                        "foreground": "#9abb87",
                    }
                }
            ]
        }
    },
Veykril commented 8 months ago

I think it would work out fine with semantic tokens if we didn't need to patch this out https://github.com/rust-lang/rust-analyzer/blob/526bd3bc64890e039e3894f1a3529767f2d2b55c/editors/code/src/client.ts#L422

Then we wouldn't emit semantic tokens for the comments except for the injection part. Unfortunately VSCode's rust grammar is not that great in comparison to full semantic tokens.

zhaouv commented 8 months ago

there are still some details to be ensure. likes the rust not run not_run and <empty>, these four cases now regarded highlight, it seems need to be adjust.
anyway the current stage, at least the ( ) answer) is highlight right from D to B

zhaouv commented 8 months ago

And also more things can be done: when you type a Enter at red O, auto fill /// in next line (by vscode.languages.setLanguageConfiguration('rustdoc', { onEnterRules: [...] }) image

and i have not put this into pr yet. do you want it? (or with a bool put in settings?)

and the emit semantic tokens part is difficult for me to understand and change it correctly in a short time, if it need to be changed, maybe you guys do it yourself it a better choice

dcompoze commented 6 months ago

I'm running v0.4.1861 of the extension in VSCode and I have ran into 2 issues:

Could this be something with my settings or is this not implemented yet?

This image shows the /// comment is formatted, the /** with * on every line also works, while /** */ block without the leading * doesn't work:

zhaouv commented 6 months ago

The reason is this: https://github.com/rust-lang/rust-analyzer/issues/16253#issuecomment-1884212742 . At this point, this area is already considered as rustdoc rather than rust.

Should set vscode.languages.setLanguageConfiguration('rustdoc', { onEnterRules: [...] } for rustdoc rather than rust for this case.
I did ask them twice for it "do you want it?", they did not reply.

Picture C in this comment https://github.com/rust-lang/rust-analyzer/issues/16253#issuecomment-1884762740. I have already said that it won't work without the * and its reason.

And flickering highlighting can be fixed with changging rustdoc grammar.

But I will not contribute for this, because the only thing they are likely to do is so let's roll that back (https://github.com/rust-lang/rust-analyzer/issues/16602#issuecomment-1958818222). They do not learn the reason only said rollback. everything you contribute maybe somehow rollback in someday.

zhaouv commented 6 months ago

Their behavior does not deserve a good outcome.

Veykril commented 6 months ago

The reason why I said that we should probably remove this again is because we tried in a past and it cost me way too much time attempting to debug the issues that arose from it back then, we don't have anyone on the team who knows their way around textmate grammar so maintaining this is generally difficult for us and we don't demand from outside contributors to commit to doing follow up patches to their changes.

I'm sorry that you took my words in a sour manner that was not my intent but the team is just very much overloaded as it stands so from my perspective reverting the changes was the most logical choice at that time.

zhaouv commented 6 months ago

(link this issue without comment in this issue will not trigger a notification.) I check the this issue at the first time, and find it alreadly deside to rollback. allright.

zhaouv commented 6 months ago

The time you cost in debug this should be enough for reading these links.

As for relevant information, it's actually TextMate's documentation. VSCode adopted this design, so no separate references are provided.

https://www.apeth.com/nonblog/stories/textmatebundle.html https://macromates.com/manual/en/all_pages https://macromates.com/manual/en/language_grammars#naming-conventions