vim / vim

The official Vim repository
https://www.vim.org
Vim License
35.67k stars 5.35k forks source link

Enhancing Support for Sentence and WORD Delimiters in Chinese and Japanese #14943

Open VimWei opened 1 month ago

VimWei commented 1 month ago

Chinese and Japanese, unlike English which relies on spaces for separation, use distinct punctuation marks such as full stops (。), exclamation marks (!), and question marks (?) to denote the end of sentences. Additionally, within sentences, smaller units are marked by punctuation such as commas (,), enumeration commas (、), and semicolons (;).

I propose the incorporation of additional options for Sentence Delimiters and WORD Separators to improve Vim's command responsiveness, including W, E, B, (, and ), within environments utilizing Chinese, Japanese, and similar languages.

Is implementing this feature challenging?

Two relevant plugins have been assessed:

  1. fuenor/jpmoveword.vim enhances W, E, B command functionality in Chinese and Japanese scripts. While it doesn't fully manage Chinese and Japanese WORDs, it enables user-defined separators like '、,;:。!?', significantly enhancing Vim's default capabilities.
  2. Preservim/vim-textobj-sentence aimed to "Allow users to specify additional sentence terminators," yet has seen no fruition despite years of effort.

A comprehensive solution remains elusive. As a layperson without programming expertise, the technical intricacies of this problem are beyond me. I urge the Vim official team to commit resources to address this issue.

Desired Solution

  1. For Sentence Segmentation: Recognizing the challenge of solving all issues given the complexity of real-world scenarios, my aim is to tackle the most frequent cases. A simple method can include using full stops (。), exclamation marks (!), and question marks (?) for straightforward sentence segmentation.

Challenging Scenarios:

  1. For WORD Segmentation: A perfect, highly intelligent solution is unrealistic without semantic comprehension. A pragmatic approach, utilizing enumeration commas (、), commas (,), and semicolons (;) as temporary WORD separators, is preferred.

Overall: Aim to provide users with numerous choices to maximize Vim's efficiency in managing Chinese and other multilingual settings.

Shane-XB-Qian commented 1 month ago

this is more like a spec but not a bug, :smile:

// you may need to find a right place to discuss it. // BTW: i've see some similar before, and you should be know chinese word is different like english word.

-- shane.xb.qian

VimWei commented 1 month ago

this is more like a spec but not a bug, 😄

Yes, this is a Feature request, not Bug report.

you should be know chinese word is different like english word.

Yes, there are significant differences, so what I'm discussing is the concept of "WORD" in Vim, not "word."

cpplearner commented 1 month ago

AFAIK the lowercased w (as well as e and b) already recognizes these punctuation marks.