minad / jinx

🪄 Enchanted Spell Checker
GNU General Public License v3.0
436 stars 23 forks source link

How to disable check Chinese character #4

Closed Jousimies closed 1 year ago

Jousimies commented 1 year ago

After I enable jinx-mode in org-mode, all Chinese chars were displayed error as below pic show.

image

Any help ?

minad commented 1 year ago

Multi language support is a notable feature of Jinx. You can set this via (setq jinx-languages '("en" "cn")). Please let me know if this works!

Jousimies commented 1 year ago

Add "cn" to jinx-languages not work as expect.

But I add \cc regexp to jinx-exclude-regexps, it worked.

Thank you.

minad commented 1 year ago

I see. Could you please enlighten me about spell checkers in Chinese? Do you also have a Chinese dictionary installed which is picked up by Aspell/Hunspell etc? Chinese words are usually short and it seems logographic writing systems are not supported by these packages (http://aspell.net/0.61/man-html/Languages-Which-Aspell-can-Support.html).

Jousimies commented 1 year ago

Spell check in Chinese is a very complex thing, total different with spell check like English . In English, one may mis-spelled a word, but in Chinese every single char displayed with computer is right, it cannot be wrong.

A bit like sentence check(grammar check) in English, one may mis used a verb(for example), in Chinese, one may mis used a single char in a sentence, due to Chinese have many char with similar pronounce but cannot used together.

Exclude spell check of Chinese is enough.

minad commented 1 year ago

@Jousimies Thanks, so for Chinese it boils down to some complex checking like checking the grammar.

minad commented 1 year ago

Instead of adding a regexp to jinx-exclude-regexps one can also configure word characters in jinx--base-syntax-table via modify-syntax-entry.

Ziqi-Yang commented 5 months ago

Here is my configuration using modify-syntax-entry approach:

;; See issue https://github.com/minad/jinx/issues/4
;; This is the syntax table approach. It changes CJK characters from "w" (
;; word constituent) to "_" (symbol constituent). You can use `describe-char'
;; to view a characters' specific syntax category (from major mode syntax table).
;; Emacs 29 supports Unicode 15, the code charts of which can be found at
;; http://www.unicode.org/charts/ (use mouse hover to show the specific range)
(let ((st jinx--base-syntax-table))
  (modify-syntax-entry '(#x4E00 . #x9FFF) "_" st)   ; CJK Unified Ideographs
  (modify-syntax-entry '(#x3400 . #x4DBF) "_" st)   ; CJK Unified Ideographs Extension A
  (modify-syntax-entry '(#x20000 . #x2A6DF) "_" st) ; CJK Unified Ideographs Extension B
  (modify-syntax-entry '(#x2A700 . #x2B73F) "_" st) ; CJK Unified Ideographs Extension C
  (modify-syntax-entry '(#x2B740 . #x2B81F) "_" st) ; CJK Unified Ideographs Extension D
  (modify-syntax-entry '(#x2B820 . #x2CEAF) "_" st) ; CJK Unified Ideographs Extension E
  (modify-syntax-entry '(#x2CEB0 . #x2EBEF) "_" st) ; CJK Unified Ideographs Extension F
  (modify-syntax-entry '(#x30000 . #x3134F) "_" st) ; CJK Unified Ideographs Extension G
  (modify-syntax-entry '(#x31350 . #x323AF) "_" st) ; CJK Unified Ideographs Extension H
  (modify-syntax-entry '(#x2EBF0 . #x2EE5F) "_" st) ; CJK Unified Ideographs Extension I
  )

However, this adds ~0.14s to my Emacs startup time. Sadly X0