Open ftonneau opened 3 months ago
Of course, the example should be:
My name is A.B.Jones. Be my guest.
(facepalm). I mentioned the lack of space requirement after .;!?
on the Kakoune forum years ago, but never filled the bug report.
A better example (for real):
Philosophers (e.g., Fodor, 1975) and linguists (e.g., Chomsky, 1959) disagree.
Placing the cursor on P
and extending to sentence end repeatedly results in 4 false stops (on e.
,g.
, e.
, g.
) And we cannot even repeat the last object selection directly (with <a-.>
), because at each stage we are stuck on the period. Instead, at each stage we need to extend the selection to the right a little bit before typing <a-.>
and getting unstuck. Kakoune's support for sentence ending should definitely be improved.
I agree this is something to be fixed, I'll try to dedicate a bit of time to that.
Tools like fmt
require two spaces after a sentence, to disambiguate sentence breaks from abbreviations: you wouldn't want Paging Dr. Jones!
to break after the "r", nor would you want to write Dr.Jones
to make things work.
Unfortunately, in the modern era when typewriters have fallen out of fashion, most typing is done in proportionally-spaced contexts like this text-box, or Microsoft Word, or other tools that handle the whitespace characters for you, so nobody bothers to put two spaces at the end of a sentence anymore. In practice, there is no good way to detect the end of a sentence anymore, and the most reliable approximation is to bake a bunch of special-cases like "Dr." into the code which is inelegant.
I don't think Kakoune's "sentence end" selection is a buggy solution to a problem, I think it's a perfectly reasonable solution to a buggy problem.
FYI, this is how sentence is defined in Vim's help (:h sentence
):
A sentence is defined as ending at a '.', '!' or '?' followed by either the end of a line, or by a space or tab. Any number of closing ')', ']', '"' and ''' characters may appear after the '.', '!' or '?' before the spaces, tabs or end of line. A paragraph and section boundary is also a sentence boundary. If the 'J' flag is present in 'cpoptions', at least two spaces have to follow the punctuation mark;
<Tab>
s are not recognized as white space. The definition of a sentence cannot be changed.
This logic is implemented by Vim in function findsent.
My opening example was completely and stupidly messed up. My latter example is a better one:
Philosophers (e.g., Fodor, 1975) and linguists (e.g., Chomsky, 1959) disagree.
Here Kakoune will detect a sentence end at five different places, the first four ones being false positives because they involve a period not followed by a space.
It is true that no reasonable definition will eliminate all false positives (e.g., the period in Dr. Jones
), but a definition such as Vim's is better than Kakoune's because contrary to the latter, Vim's definition eliminates more false positives.
Vim's definition also takes into account false negatives to the space-after-period rule such as a sentence "ending in quotes."
IMHO, the best thing for Kakoune would be to follow Vim's (and Emacs') definition. This may involve a lot of effort or complication.
Edit: removed "but pending this, requiring punctuation to be followed by at least one space would already be an improvement on the current definition."
Thinking twice, the best thing would be either (a) to go all the way to a Vim-like definition, or (b) to leave the current source code as is, given that the sentence-end issue can be improved at the plugin level.
@ftonneau just an FYI, a colon is a :
character, and .
is called a "period" or "full-stop"
Got a bit confused reading your comments.
You are right, thanks for correcting. I edited my posts accordingly.
Version of Kakoune
Development version or current version on Arch
Reproducer
Write this in an empty buffer:
My name is A.B. Jones. Be my guest.
Position your cursor at line start (on "M"), then select an outer sentence with
<a-a>s
Outcome
Kakoune selects "My name is A."
Expectations
Kakoune should select "My name is A.B. Jones. "
Additional information
From
selectors.cc
, Kakoune defines the end of a sentence as one of.;!?
characters. This is incorrect in English as well as other Western languages. The end of a sentence is better defined as one of.;!?
characters followed by one or two horizontal spaces or a line return. (A few corner cases could also be considered in English, as when the ending period is followed by a closing quote, but including space after.;!?
would at least take care of the most common cases.)