streetsidesoftware / vscode-spell-checker

A simple source code spell checker for code
https://streetsidesoftware.github.io/vscode-spell-checker/
Other
1.43k stars 127 forks source link

How to use a different (English) dictionary for specific file types. #676

Open redactedscribe opened 3 years ago

redactedscribe commented 3 years ago

In short, I can't find a way to only have en-GB suggestions when en/en-US is loaded. Ideally, only behaviour and not behavior would be considered correct. Currently, any American English spellings will go unnoticed in a file intended to be written in British English.


I'm attempting to load en-US and en-GB for different languages, so I set the global language/locale to nothing:

"cSpell.language": "",

Unlike en-GB, en and en-US cannot be loaded for a given language:

"cSpell.languageSettings": [
    {
        "languageId": "plaintext",
        "dictionaries": [
            "en",
            "en-us"
        ]
    }
]

At this point, a plaintext file will still show unknown words for many common words such as "are" and "which", and en_us is not shown as an active dictionary. If setting the above to only the value en-GB, all the unknown words become known ones, and en-gb is shown as an active dictionary. Only the latter can be loaded on a per-language basis. This was my attempt to get some isolation between the English variants, but it seems it's not possible, and in fact, wouldn't be the most practical solution.


That leads me on to my feature request: I'd like an option to only honor the en-GB locale when both it and the en_us dictionary (and possibly other future English variants) are loaded. The ability to set the preferred English locale, as well as to allow both to be active (current behavior), on a per- file/workspace basis would be appreciated.

There could be an option "Locales loaded for languageIds override the global locale".

This way, for example, cSpell.language could be set to en_US and the above snippet to ... "dictionaries": [ "en-GB" ]}] for a specific languageId, and if the option above is true, it'd automatically override, else, it'd just be the normal behavior of words from both dictionaries being valid. I'm guessing the reverse should also be possible (en-US overriding en-GB)?

Thanks!

Jason3S commented 3 years ago

Did you try?

"cSpell.language": "en-GB",

The dictionary name and the locale names do not always match. image

In any case, you can set the locale on a per file basis using overrides: image

You can also set the locale within a file:

// cspell:locale en-gb

I hope this answers your question.

redactedscribe commented 3 years ago

Yes, I've tried using "en-GB", but that then uses British English by default (which isn't what I'd like) for all files and I cannot override it with languageId to "en-US" (maybe a bug?). Only the opposite is true. Typically, I want to code in one locale, and to write scripts/plaintexts in another. I mostly work with individual non-workspace files.

Your suggestions do help, though they don't completely solve my issue. If I've understood correctly, it seems the only options are to explicitly set up a cspell.json with overrides (a useful solution for workspaces), or to explicitly write a cspell comment per file, which is something I'd like to avoid: this must be manually done every time, and for convenience, the comment must be saved into the file (which I don't particularly want in my scripts/text files).

// cspell:locale en-gb does behave how I want: en/en-us is completely ignored and only en-gb is queried, but it's unfortunate this must be written on a per-file basis. I appreciate it's possible, just not ideal.

Thank you.

Jason3S commented 3 years ago

languageId refers to the file type classification i.e. python. This is a bit confusing. The terminology was originally based upon VS Code terminology.

languageSettings gives the ability to change the configuration based upon either languageId or local.

overrides give the ability to change the configuration based upon filenames. Overrides happen before languageSettings are applied. That way you can say .yarnrc files should be treated as json.

If you do not want to have a cspell.json file in every project, you can tell VS Code to import a global one: image

Overrides can also be added to VS Code settings:

    "cSpell.overrides": [
        {
            "filename": "**/*.yaml",
            "language": "en-GB"
        }
    ],

Another option (This one matches more of what you have been asking for):

    "cSpell.languageSettings": [
        {
            "languageId": "plaintext",
            "dictionaries": [
                "!en_us",
                "en-gb"
            ]
        }
    ]

It says, if the file type is plaintext then disable the en_us dictionary and add en-gb dictionary.

redactedscribe commented 3 years ago

Thank you very much for your explanations. It was very helpful.

Specifically, seeing !en_us made me realize that I had been attempting to use en-us (with a hyphen), which has no effect (unlike en-gb). I think my confusion came from the fact that cSpell.language is lenient with the language name (en/en-us/en-US/engb/etc), and too, from the dictionaries en_us and en-gb being named inconsistently (they must be specified for "dictionaries": exactly as they are named as seen via the VS Code "Show Spell Checker Configuration Info" command palette command).

Could you give me a small example of cSpell.languageSettings changing the configuration based upon local?

Lastly, is there a difference between cSpell.dictionaries and "languageId": "*", "dictionaries":? For example, it seems I could set either of those, or even cSpell.language, to en_us and I'd have American English in use globally.

Thank you!

redactedscribe commented 3 years ago

A potential bug: I've also noticed that even with "cSpell.language": "en-US"and !en_us set for plaintexts, the Spell Checker Configuration Info still shows en_us (en, en-US) as an active dictionary for the file, and it's present under the Dictionaries tab too.

This may be expected for all I know, as I'm still somewhat confused about how these settings work. However, I'd say my original issue is basically solved.

Jason3S commented 3 years ago

A potential bug: I've also noticed that even with "cSpell.language": "en-US"and !en_us set for plaintexts, the Spell Checker Configuration Info still shows en_us (en, en-US) as an active dictionary for the file, and it's present under the Dictionaries tab too.

This may be expected for all I know, as I'm still somewhat confused about how these settings work. However, I'd say my original issue is basically solved.

I'll double check the logic.

Jason3S commented 3 years ago

Specifically, seeing !en_us made me realize that I had been attempting to use en-us (with a hyphen), which has no effect (unlike en-gb). I think my confusion came from the fact that cSpell.language is lenient with the language name (en/en-us/en-US/engb/etc), and too, from the dictionaries en_us and en-gb being named inconsistently (they must be specified for "dictionaries": exactly as they are named as seen via the "Show Spell Checker Configuration Info" command palette command).

The locale detection is lenient because they way they are expressed across the industry is inconsistent. Dictionary names are an exact match. The inconsistency in the dictionary names is historic and difficult to fix without breaking things.

Could you give me a small example of cSpell.languageSettings changing the configuration based upon local?

Using the local filter, also allows you to include different dictionaries based upon the current locale.

    "cSpell.languageSettings": [
        {
            "local": "en-GB",
            "dictionaries": ["my-company-terms-en-gb"]
        },
        {
            "local": "en-US",
            "dictionaries": ["my-company-terms-en-us"]
        }
    ]

Lastly, is there a difference between cSpell.dictionaries and "languageId": "*", "dictionaries":? For example, it seems I could set either of those, or even cSpell.language, to en_us and I'd have American English in use globally.

There is. dictionaries listed at the top level enables dictionaries globally. But it is not possible to remove a dictionary at the global level. That is because the dictionaries defined in languageSettings are applied later. languageSettings are applied in the order listed. So it is possible to add a dictionary and then later remove it.

redactedscribe commented 3 years ago

Sorry to leave yet another comment, but I cannot seem to figure out how to do what this issue was originally named.

My goal is to isolate both British English and American English. I can disable !en_us and enable en-gb for specific file types, so that only the British dictionary is in effect, yet the problem remains that there are some words in both dictionaries that should not be / that I don't want, e.g. in en_us: "favour", "savour", "metre", and in en-gb: "meter" and "customize".

My test wordlist:

en-gb       en_us
----------- -----------
behaviour   behavior
favour      favor
favourite   favorite
savour      savor
metre       meter
customise   customize

Is it possible to make cSpell flag everything as incorrect on the left when the right-hand side dictionary is in use, and vice verse? I realize it would be a manual task compiling these incorrect/undesired words since the dictionaries themselves are not perfect.

Configuration excerpt (see comment on how I'd like it to behave):

"cSpell.language": "en,en-US",
"cSpell.languageSettings": [
    { // Bug?: "Missing property languageId" (yet this scope is still effective)
        "local": [ "en", "en-US" ],
        //"flagWords": [
        //    "favour",
        //    "savour",
        //    "metre"
        //]
    },
    {
        "languageId": [ "plaintext" ],
        // if commented, `flagWords` below should be too, else, comment `flagWords` above:
        "dictionaries": [
            "!en_us",
            "en-gb"
        ],
        "flagWords": [
            "meter",
            "customize"
        ]
    }
]

Is this possible any other way than toggling comments?

Thanks.