mawww / kakoune

mawww's experiment for a better code editor
http://kakoune.org
The Unlicense
9.75k stars 709 forks source link

[BUG] Sentence end defined incorrectly #5134

Open ftonneau opened 3 months ago

ftonneau commented 3 months ago

Version of Kakoune

Development version or current version on Arch

Reproducer

Write this in an empty buffer:

My name is A.B. Jones. Be my guest.

Position your cursor at line start (on "M"), then select an outer sentence with <a-a>s

Outcome

Kakoune selects "My name is A."

Expectations

Kakoune should select "My name is A.B. Jones. "

Additional information

From selectors.cc, Kakoune defines the end of a sentence as one of .;!? characters. This is incorrect in English as well as other Western languages. The end of a sentence is better defined as one of .;!? characters followed by one or two horizontal spaces or a line return. (A few corner cases could also be considered in English, as when the ending period is followed by a closing quote, but including space after .;!? would at least take care of the most common cases.)

ftonneau commented 3 months ago

Of course, the example should be:

My name is A.B.Jones. Be my guest.

(facepalm). I mentioned the lack of space requirement after .;!? on the Kakoune forum years ago, but never filled the bug report.

ftonneau commented 3 months ago

A better example (for real):

Philosophers (e.g., Fodor, 1975) and linguists (e.g., Chomsky, 1959) disagree.

Placing the cursor on P and extending to sentence end repeatedly results in 4 false stops (on e. ,g., e., g.) And we cannot even repeat the last object selection directly (with <a-.>), because at each stage we are stuck on the period. Instead, at each stage we need to extend the selection to the right a little bit before typing <a-.> and getting unstuck. Kakoune's support for sentence ending should definitely be improved.

mawww commented 3 months ago

I agree this is something to be fixed, I'll try to dedicate a bit of time to that.

Screwtapello commented 3 months ago

Tools like fmt require two spaces after a sentence, to disambiguate sentence breaks from abbreviations: you wouldn't want Paging Dr. Jones! to break after the "r", nor would you want to write Dr.Jones to make things work.

Unfortunately, in the modern era when typewriters have fallen out of fashion, most typing is done in proportionally-spaced contexts like this text-box, or Microsoft Word, or other tools that handle the whitespace characters for you, so nobody bothers to put two spaces at the end of a sentence anymore. In practice, there is no good way to detect the end of a sentence anymore, and the most reliable approximation is to bake a bunch of special-cases like "Dr." into the code which is inelegant.

I don't think Kakoune's "sentence end" selection is a buggy solution to a problem, I think it's a perfectly reasonable solution to a buggy problem.

schragge commented 3 months ago

FYI, this is how sentence is defined in Vim's help (:h sentence):

A sentence is defined as ending at a '.', '!' or '?' followed by either the end of a line, or by a space or tab. Any number of closing ')', ']', '"' and ''' characters may appear after the '.', '!' or '?' before the spaces, tabs or end of line. A paragraph and section boundary is also a sentence boundary. If the 'J' flag is present in 'cpoptions', at least two spaces have to follow the punctuation mark; <Tab>s are not recognized as white space. The definition of a sentence cannot be changed.

This logic is implemented by Vim in function findsent.

ftonneau commented 3 months ago

My opening example was completely and stupidly messed up. My latter example is a better one:

Philosophers (e.g., Fodor, 1975) and linguists (e.g., Chomsky, 1959) disagree.

Here Kakoune will detect a sentence end at five different places, the first four ones being false positives because they involve a period not followed by a space.

It is true that no reasonable definition will eliminate all false positives (e.g., the period in Dr. Jones), but a definition such as Vim's is better than Kakoune's because contrary to the latter, Vim's definition eliminates more false positives.

ftonneau commented 3 months ago

Vim's definition also takes into account false negatives to the space-after-period rule such as a sentence "ending in quotes." IMHO, the best thing for Kakoune would be to follow Vim's (and Emacs') definition. This may involve a lot of effort or complication.

Edit: removed "but pending this, requiring punctuation to be followed by at least one space would already be an improvement on the current definition."

Thinking twice, the best thing would be either (a) to go all the way to a Vim-like definition, or (b) to leave the current source code as is, given that the sentence-end issue can be improved at the plugin level.

kamurani commented 2 months ago

@ftonneau just an FYI, a colon is a : character, and . is called a "period" or "full-stop" Got a bit confused reading your comments.

ftonneau commented 2 months ago

You are right, thanks for correcting. I edited my posts accordingly.