streetsidesoftware / vscode-spell-checker

A simple source code spell checker for code
https://streetsidesoftware.github.io/vscode-spell-checker/
Other
1.36k stars 124 forks source link

Spellcheck only comments #107

Open haraldF opened 7 years ago

haraldF commented 7 years ago

Would be nice to have an option to spell check only comments, independent of the language.

This allows documentation writers to get spell checking without drowning in warnings from badly written code.

Jason3S commented 7 years ago

There is a way to do it for each language. But it has to be done for each language. What languages are you using? I'll see if I can give you an example.

The idea to only include text for matching via a regex.

Here is an example that can be added to your user or workspace settings.json file.

    "cSpell.languageSettings": [
        // This one works with python
        {
            "languageId": "python",
            "includeRegExpList": [
                "/#.*/",
                "/('''|\"\"\")[^\\1]+?\\1/g"
            ]
        },
        // this one works with javascript, C, typescript, etc, 
        // but you need to copy it and change the language id.
        {
            "languageId": "javascript",
            "includeRegExpList": [
               "CStyleComment"
            ]
        }
    ]

This is a version that works in a cSpell file:

    "languageSettings": [
        // This one works with python
        {
            "languageId": "python",
            "includeRegExpList": [
                "/#.*/",
                "/('''|\"\"\")[^\\1]+?\\1/g"
            ]
        },
        // this one works with javascript, C, typescript, etc, 
        // but you need to copy it and change the language id.
        {
            "languageId": "javascript",
            "includeRegExpList": [
               "CStyleComment"
            ]
        }
    ]
haraldF commented 6 years ago

thanks Jason for the quick reply, I'm well aware that I can maintain such a file myself for the languages that I'm using, I just wonder whether it would be smarter to support such a feature out of the box for the convenience of documentation authors. If it's too complex, feel free to close this bug report.

Jason3S commented 6 years ago

I agree, it would be useful.

I don't have an easy way to detect comments in each language. At the moment, the only way is to add include expressions.

I have been playing with the idea of reading TextMate colorizer files and trying to glean the meaning from those. But I don't have a lot of time to work on this.

raffaelespazzoli commented 5 years ago

@Jason3S can you provide language-specific expression for golang? thanks Also +1 for this feature.

jmcker commented 5 years ago

It might be worth noting that combining the above with the strings expression suggested in #116 and in the documentation worked well. Also, it seems language ids can be listed as a comma-separated list rather than copying the whole block (not in the docs, but seen in #116 and seems to work when testing). All together, this worked well enough to keep me using Code Spell Checker 😄

"cSpell.languageSettings": [
    // This one works with Python
    {
        "languageId": "python",
        "includeRegExpList": [
            "/#.*/",
           "/('''|\"\"\")[^\\1]+?\\1/g",
            "strings"
        ]
    },
    // This one works with JavaScript, Typescript, etc
    {
        "languageId": "javascript,typescript",
        "includeRegExpList": [
            "CStyleComment",
            "strings"
        ]
    },
    // Use with cpp or c files
    {
        "languageId": "cpp,c",
        // Turn off compound words, because it is only checking strings.
        "allowCompoundWords": false,
        // Only check comments and strings
        "includeRegExpList": [
            "CStyleComment",
            "string"
        ],
        // Exclude includes, because they are also strings.
        "ignoreRegExpList": [
            "/#include.*/"
        ]
    }
]
Jason3S commented 5 years ago

@jmcker great example.

YannDubs commented 4 years ago

@Jason3S @jmcker thanks for these snippet. I'm really bad at regex and have been trying to remove spell checking in inline code (between backticks) in comments. Any idea on how to do that ? thanks :)

memeplex commented 4 years ago

Isn't it possible to use the syntax highlighting category determined by grammar for the current language (see https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide) in order to activate/deactivate spell checking rules in a more general way? This seems more convenient than repeating regular expressions for each language, although it's cool to have the regex rules for corner cases.

Jason3S commented 4 years ago

Isn't it possible to use the syntax highlighting category determined by grammar for the current language (see https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide) in order to activate/deactivate spell checking rules in a more general way? This seems more convenient than repeating regular expressions for each language, although it's cool to have the regex rules for corner cases.

It isn't possible for an extension to get access to the syntax-highlighting. I do have a long term plan on how to do this. It is more of a time constraint.

Jason3S commented 4 years ago

@Jason3S @jmcker thanks for these snippet. I'm really bad at regex and have been trying to remove spell checking in inline code (between backticks) in comments. Any idea on how to do that ? thanks :)

Please open a new issue with your exact challenge. Include some examples and refer to this issue. I will see if I can help you out.

Jason3S commented 4 years ago

@YannDubs See comment above.

r3code commented 3 years ago

golang

@raffaelespazzoli Here is my cSpell settings for golang (add to User settings. json):

"cSpell.languageSettings": [
        // GoLang
        // Set what strings to check (see https://github.com/streetsidesoftware/vscode-spell-checker/issues/107)
        {
            "languageId": "go",
            // Turn off compound words, because it is only checking strings.
            "allowCompoundWords": false,
            // Only check comments and strings
            "includeRegExpList": [
                "CStyleComment",
                "string"
            ],
            // Exclude imports, because they are also strings.
            "ignoreRegExpList": [
                // ignore mulltiline imports
                "import\\s*\\((.|[\r\n])*?\\)",
                // ignore single line imports
                "import\\s*.*\".*?\""
            ],
        }
    ]

Also you can fork it here https://gist.github.com/r3code/21d1e9a3f862ad865808f07225b59068

harisont commented 3 years ago

In case you need the same in Haskell, I'm not great at regex but trial and error led me to something that does the job:

{
            "languageId": "haskell",
            "includeRegExpList": [
                "/--.*/",
                "{-(.|\n)*?-}",
                "string"
            ]
        }
ryanfitzer commented 3 years ago

It isn't possible for an extension to get access to the syntax-highlighting. I do have a long term plan on how to do this. It is more of a time constraint.

@memeplex I had the same question. Coming from Textmate with and having written a lot of custom bundles, I was really surprised that VSCode does not make this available to extension authors. Here's an issue that explains the issue and tracks the request: https://github.com/microsoft/vscode/issues/580

Diogo-Rossi commented 3 years ago

I got to know how to spellcheck only comments and strings by reading this issue. Thank you very much!

But I think it should be more clear in the README. E.G. it doesn't mention the option "includeRegExpList".

Also, the include example in the README explicitly tells that only comments and block strings will be checked for spelling, but it does not work very well (look below).

image

Jason3S commented 2 years ago

@Diogo-Rossi, Looks like the regexp was wrong in the example. It was too greedy and also matched the expression itself.

It should be:

# cSpell:includeRegExp #.*
# cSpell:includeRegExp /(["]{3}|[']{3})[^\1]*?\1/g
# only comments and block strings will be checked for spelling.
def sum_it(self, seq):
    """This is checked for spelling"""
    variabele = 0
    alinea = 'this is not checked'
    for num in seq:
        # The local state of 'value' will be retained between iterations
        variabele += num
        yield variabele

For Python, you can now use:

# cspell:includeRegExp comments

See: cspell-dicts/cspell-ext.json at main · streetsidesoftware/cspell-dicts

ErezArbell commented 1 year ago

I added this for shellscript and python. It still needs work. Since my code has also commands as strings, I include here only strings that start with capital letter, which in most cases are English sentences.

    "cSpell.languageSettings": [
      {
        "languageId": "shellscript",
        "includeRegExpList": [
            "/#.*/",
            "/('|\")[A-Z][^\\1]+?\\1/g",
        ]
      },
      {
        "languageId": "python",
        "includeRegExpList": [
          "/#.*/",
          "/('|\")[A-Z][^\\1]+?\\1/g",
          "/('''|\"\"\")[^\\1]+?\\1/g",
        ]
      },
    ],
Jason3S commented 1 year ago

@ErezArbell,

I suggest testing out your regular expressions on https://regex101.com Use the JavaScript ECMAScript.

The spell checker has a feature you can use to see your expressions:

  1. Enable the Experimental Regexp View image
  2. Click on image

You should see all the patterns: image

Jason3S commented 1 year ago

Please note: this expression will only match ALL-CAPS WORDS.

"/('|\")[A-Z][^\\1]+?\\1/g"
ErezArbell commented 1 year ago

No. It will match strings whose first letter is capital. as I wrote above, this is on purpose since there are strings that contain commands.

Jason3S commented 1 year ago

No. It will match strings whose first letter is capital. as I wrote above, this is on purpose since there are strings that contain commands.

My mistake.

jace commented 7 months ago

Does anyone have a solution for matching only the value in JSON objects? Eg, in {"description": "Catch spellin here"}, the key is typically part of an API contract and will be validated separately, but the value needs spell checking.

jace commented 7 months ago

This works for ignoring JSON object keys:

    {
      "languageId": "json,jsonc",
      "allowCompoundWords": false,
      "ignoreRegExpList": ["/\"[^\"]*\":/"]
    },

The default list of ignores doesn't appear to be overriden by this — they continue to be ignored.

jace commented 7 months ago

After some tweaking, I've found Python settings that work for me:

    {
      "languageId": "python",
      "allowCompoundWords": false,
      "includeRegExpList": ["comments", "string"],
      "ignoreRegExpList": [
        // Ignore single-quoted strings ('symbols' and '''embedded code like SQL''')
        "/'.*?'/g",
        "/'''.+?'''/gm",
        // Ignore code in braces in f-strings: f"...{code} ... {code}", f'...{code}'
        "/(?<=(?:f|rf|fr)(?:\"[^\"]*|'[^']*))\\{.*?\\}/g",
        // Ignore reStructuredText code samples (indented block after line ending with
        // `::`), but don't ignore `.. directive::`. If your documentation uses
        // `.. code-block:: lang`, remove the `(?<!...)`
        "/(?<!\\s*\\.\\..*)::$\\n+(\\s+).*\\n(?:^\\n|^\\1.*\\n)*/gm",
        // Ignore linter directive comments
        "/#\\s*(flake8:|isort:|noqa:|nosec\\s|pragma:|pylint:|pyright:|type:).*/i",
        // Ignore words within `backticks` or ``backticks``, used for references
        "/(`{1,2}).*?\\1/g",
        // Ignore reStructuredText parameter names and types
        "/:(param|type).*?:/"
      ]
    },

I haven't figured out how to ignore only the reStructuredText .. code-block:: <lang> directive.

konstabark commented 4 months ago

These are my settings: for python, C++ and C I have spell check only for comments and strings and for .json files only comments spell check (becaus e.g. for settings.json there are a lot of strings that raise plenty of messages in problems section.

        {
            "languageId": "python,cpp,c",
            "includeRegExpList": [
                // For Python
                "comments",
                "strings"
                // For C++ and C
                "CStyleComment",
                "string"
            ]
        },
        {
            "languageId": "jsonc", // (.json with comments) because most "commands" here are strings
            "includeRegExpList": [
                "CStyleComment",
            ]
        }
    ],

Notice how you can have just one block for all you languages (add more seperated by a comma) and you just add the language-specific way of telling it to check for comments ans strings (only the json case has different block because we don't want it to check for strings there).

memeplex commented 4 months ago

Not much advancement VSCode-side in providing access to the syntax tree, but there are a couple of developments to parse from a textmate or treesitter grammar that may improve performance and simplify development wrt regular expressions:

RahulSisondia commented 3 weeks ago

These are my settings: for python, C++ and C I have spell check only for comments and strings and for .json files only comments spell check (becaus e.g. for settings.json there are a lot of strings that raise plenty of messages in problems section.

        {
            "languageId": "python,cpp,c",
            "includeRegExpList": [
                // For Python
                "comments",
                "strings"
                // For C++ and C
                "CStyleComment",
                "string"
            ]
        },
        {
            "languageId": "jsonc", // (.json with comments) because most "commands" here are strings
            "includeRegExpList": [
                "CStyleComment",
            ]
        }
    ],

Notice how you can have just one block for all you languages (add more seperated by a comma) and you just add the language-specific way of telling it to check for comments ans strings (only the json case has different block because we don't want it to check for strings there).

Hi, Is it a typo that there is no comma after first occurrence of "strings" or has some plugin specific significance ? Also there is no PredefinedPatterns named "comments" .