Ignore words with non-latin characters

alystair commented 4 years ago

Obviously Spanish and other pure latin char languages would have to be ignored manually by user, but Russian and other languages that use non-latin characters should automatically be ignored unless we're using that language dictionary?

johnml1135 commented 9 months ago

I was able to get around it with this:

  "cSpell.includeRegExpList": [
    "\b[a-zA-Z0-9.]+\b"
  ],

Jason3S commented 9 months ago

It is necessary to explicitly ignore character sets. By default, the spell checker checks all text.

It is possible to tell the spell checker to ignore a character set using the ignoreRegExpList or only include text that matches expressions in includeRegExpList.

The spell checker uses JavaScript's builtin regexp engine. To use Unicode matching the u flag needs to be added.

It is also necessary to specify Script_Extensions= when using script names. See: Unicode character class escape: \p{...}, \P{...} - JavaScript | MDN. It is always best to try out expressions at regex101: build, test, and debug regex.

Using directive within a document

// cspell:ignoreRegExp /[\p{Script_Extensions=Cyrillic}]+/gu

VS Code Settings

.vscode/settings.json

  "cSpell.ignoreRegExpList": ["/[\\p{Script_Extensions=Cyrillic}]+/gu"]

Using CSpell config

cspell.json

{
  "ignoreRegExpList": ["/[\\p{Script_Extensions=Cyrillic}]+/gu"]
}

cspell.config.yaml

ignoreRegExpList": 
  - '/[\p{Script_Extensions=Cyrillic}]+/gu'

List of Character sets

Useful reference: Unicode Scripts

List

Common
Arabic
Armenian
Bengali
Bopomofo
Braille
Buhid
Canadian_Aboriginal
Cherokee
Cyrillic
Devanagari
Ethiopic
Georgian
Greek
Gujarati
Gurmukhi
Han
Hangul
Hanunoo
Hebrew
Hiragana
Inherited
Kannada
Katakana
Khmer
Lao
Latin
Limbu
Malayalam
Mongolian
Myanmar
Ogham
Oriya
Runic
Sinhala
Syriac
Tagalog
Tagbanwa
TaiLe
Tamil
Telugu
Thaana
Thai
Tibetan
Yi

streetsidesoftware / vscode-spell-checker