Closed kadircet closed 4 years ago
I think more info is needed. What server are you running? What are the values of the variables eglot-current-column-function
and eglot-move-to-column-function
? What happens if you flip the values of those variables after reading their documentations?
I don't think behavior should be dependent on server or values of these variables, because it is clear from the spec:
The current protocol is tailored for textual documents whose content can be represented as a string. There is currently no support for binary documents. A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form aπb the character offset of the character a is 0, the character offset of π is 1 and the character offset of b is 3 since π is represented using two code units in UTF-16. To ensure that both client and server split the string into the same line representation the protocol specifies the following end-of-line sequences: β\nβ, β\r\nβ and β\rβ.
here is the info you asked though:
Workaround for clangd specifically: it sounds like eglot is specifying codepoint offsets, so you can pass clangd -offset-encoding=utf-32
to get that nonstandard behavior.
This still needs to be fixed, as eglot doesn't pass that flag to clangd by default, and other servers will rely on the standard utf-16 encoding.
I don't think behavior should be dependent on server or values of these variables, because it is clear from the spec:
I agree, but many servers (and clients) don't follow the spec so Eglot has been pragmatic about this in the past. So @nemethf is asking you what values do you have for the variables eglot-move-to-column-function
and eglot-current-column-function
. If you set them to #'eglot-move-to-lsp-abiding-column
and #'eglot-lsp-abiding-column
,respectively, you should get the behaviour you want.
If you don't get it, then it's a bug.
You may also argue for using those functions as the default. Eventually we will have to. But they make things run slower, and last time I checked it was intrinsic to the protocol (@MaskRay had some insight on this, I think).
(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)
(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)
No. The protocol seems to suck in this regard, but it's the protocol. But you can change the protocol.
Anyway, do we have any idea of which servers/clients actually abide by the LSP protocol by default. So far I only count clangd
, but surely there are more (perhaps more than last time we counted...).
So our perspective is as clangd implementers. We'd like clangd to work robustly with eglot by default, as we can't get all our users to tinker with their config.
We follow the spec, and also support other options, there's not much more we can do. If you really want to keep codepoints as the default, maybe you could tell clangd to use them? (Flag or negotiation)
Anyway, do we have any idea of which servers/clients actually abide by the LSP protocol by default.
https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0 (From https://github.com/Microsoft/language-server-protocol/issues/376)
(Wouldn't it be good if the server and the client could agree to deviate from the protocol specification and and use a different encoding?)
Clangd supports that as an LSP extension: https://clangd.github.io/extensions.html#utf-8-offsets It's backwards compatible (safe to send to any server). You can request any encoding, clangd supports utf-8, utf-16, utf-32 (eglot default, I think)
@sam-mccall, that's a nicely designed extension! Have you proposed for inclusion in the standard? (That issue is way too long to read every comment carefully.)
@joaotavora, I think I can easily add support for this extension in eglot-x. It would automatically set the variables eglot-move-to-column-function
and eglot-current-column-function
.
On the other hand, the default server for c-mode and c++-mode is ccls, so eglot does not start automatically clangd. As a result, currently users need to set eglot-server-program
manually, so they could set it to clangd -offset-encoding=utf-32
.
@joaotavora, I think I can easily add support for this extension in eglot-x. It would automatically set the variables eglot-move-to-column-function and eglot-current-column-function.
Or maybe we can use a special class for eglot-server-programs
that sets these things as the default for clangd
sessions. Or just change the global defaults and let other servers complain, after all they're the ones in the wrong, in principle.
https://docs.google.com/spreadsheets/d/168jSz68po0R09lO0xFK4OmDsQukLzSPCXqB6-728PXQ/edit#gid=0 (From microsoft/language-server-protocol#376)
There a few missing there, but it seems the tide is turning... @sam-mccall @kadircet can you test the latest commit and confirm Eglot is working correctly out-of-the-box with clangd?
@kadircet @sam-mccall ping? Can you test 9fa4561?
@kadircet @sam-mccall ping again? Or ping anyone that can confirm out-of-the-box Eglot works correctly with out-of-the-box clangd using that Eglot commit 9fa4561.
I can confirm that when using C-u M-x eglot
Using clangd-8
and these functions set accordingly eglot reports code offsets correctly. #397 adds tests for this.
No need to set βcode-offsets
sorry for the late reply, I've also tested 9fa45614220858ade17f75be2ec2758a6bbd433e and it seems to be working with clangd, without any extra command line flags.
Then, because I'm very busy right now, I invite anyone with push access to bring in 9fa4561 into master.
It's now landed on master. I've added an entry to NEWS.md. Please, check it when you have time.
added an entry to NEWS.md. Please, check it when you have time.
Good call. I'm sure it's fine. We can do NEWS reviews from time to time as well...
I reverted the change on the master branch and reopened this issue, because the tests fail. I think eglot-move-to-lsp-abiding-column
is buggy. If the column number received from the server is grater than the line length then the function should jump to the end of line. I think it does not. (Also it uses move-to-column
, which turned out to have issues in case of whitespace-mode
. But that might not be a problem here.)
From the spec: "if the character value is greater than the line length it defaults back to the line length."
I wrote a potential fix and made some measurements. The script is in a new branch. Results are at below, they show that eglot-move-to-column consequently fast and does not depend on the circumstances. It is even faster than move-to-column. The fix makes the slow eglot-move-to-lsp-abiding-column even slower.
I haven't tested the fix profoundly, but if it's correct, I expect it to keep Eglot responsive. However, I hope someone can come up with a faster solution. Please, try to test it.
This is the new function:
(defun fix-1 (column)
"Move to COLUMN abiding by the LSP spec."
(save-restriction
(narrow-to-region (line-beginning-position) (line-end-position))
(goto-char (point-max))
(setq column (min column
(/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
2)
2)))
(cl-loop
initially (move-to-column column)
for diff = (- column
(/ (- (length (encode-coding-region 1 (point) 'utf-16 t))
2)
2))
until (zerop diff)
do (condition-case nil
(forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
(end-of-buffer)))))
And these are the results (total elapsed time for execution, the number of garbage collections that ran, and the time taken by garbage collection):
test case | fn | result |
---|---|---|
short ascii | eglot-move-to-column | 0.042223 0 0 |
short ascii | move-to-column | 0.090825 4 0 |
short ascii | eglot-move-to-lsp-abiding-column | 0.203311 13 0 |
short ascii | fix-1 | 0.354881 24 0 |
short non-ascii | eglot-move-to-column | 0.042645 0 0 |
short non-ascii | move-to-column | 0.414419 4 0 |
short non-ascii | eglot-move-to-lsp-abiding-column | 1.251451 82 0 |
short non-ascii | fix-1 | 1.597782 108 0 |
long ascii front | eglot-move-to-column | 0.042150 0 0 |
long ascii front | move-to-column | 0.317773 4 0 |
long ascii front | eglot-move-to-lsp-abiding-column | 0.463316 17 0 |
long ascii front | fix-1 | 1.381337 125 0 |
long ascii end | eglot-move-to-column | 0.042232 0 0 |
long ascii end | move-to-column | 1.949140 4 0 |
long ascii end | eglot-move-to-lsp-abiding-column | 2.803400 106 0 |
long ascii end | fix-1 | 3.726439 214 1 |
long non-ascii front | eglot-move-to-column | 0.044116 0 0 |
long non-ascii front | move-to-column | 0.531376 4 0 |
long non-ascii front | eglot-move-to-lsp-abiding-column | 0.691837 17 0 |
long non-ascii front | fix-1 | 2.078963 152 0 |
long non-ascii end | eglot-move-to-column | 0.047536 0 0 |
long non-ascii end | move-to-column | 2.290028 4 0 |
long non-ascii end | eglot-move-to-lsp-abiding-column | 3.396392 106 0 |
long non-ascii end | fix-1 | 4.389540 241 1 |
Disclaimer: I'm not familiar with emacs lisp, neither the language nor the performance characteristics. I worked on the encoding support in clangd.
It looks like that function loops over prefixes of the line and encodes the prefix inside the loop. That's going to be quadratic time which explains why it's slow on long lines.
You're going to want to iterate character-by-character instead, in pseudopython (sorry, I really don't know any lisp):
def charOffset(line, u16target):
u16pos = 0
for i in [0, len(line)):
if u16pos >= u16target
return i
u16pos = u16pos + u16len(line[i])
return len(line)
For u16len you could use encode-coding-region
, though if it's slow even on short strings there may be something faster. If a "column" is a unicode codepoint in emacs then it's trivial (1 if codepoint < 0x10000, else 2). If a "column" may a cluster of codepoints but you can get them cheaply, then you can implement u16len easily without actually decoding and allocating strings.
@nemethf the functions that I and @mkcms wrote were analysed extensively at the time. I'm not saying there can't be bugs: something is indeed wrong, but since it is such a localized function, I'm pretty sure it's possible to come up with a short serverless snippet that demonstrates beyond doubt the bug. By skimming your https://github.com/joaotavora/eglot/issues/361#issuecomment-572166286 I can't be sure. Please try to spend the time to make a recipe like "In the most recent master, within a buffer with these contents, calling foo returns x and provokes z instead of returning y and provoking w" if you can.
Or, if you prefer, write a unit test for this.
Are we absolutely sure that the reason that we're getting failed tests is not that the servers in the Travis testing environment are not fully LSP-UTF-16 compliant? That would explain the failures.
EDIT:
We are making the informed decision here of switching to full LSP compliance. We didn't make that decision before because it is incompatible with some servers -- that's why we had those options in the first place. So if my suspicions are correct (I haven't analysed deeply) then we just need to locally use the old versions in those tests, until Travis CI gets a new pyls
or something.
Are we absolutely sure that the reason that we're getting failed tests is not that the servers in the Travis testing environment are not fully LSP-UTF-16 compliant? That would explain the failures.
My guess is this is whatβs happening. However, I have one case where the function is not working properly, and was mentioned briefly by me in #397 and by @nemethf here:
Use a buffer with contents described on the first post of this issue
(eglot-move-to-lsp-abiding-column 75)
Emacs reports end-of-buffer
Eglot moves cleanly to end-of-line
I think this is what causes my test to choke on clangd-7
, since it is reporting a column number much higher.
However, I have one case where the function is not working
But are you testing with clangd-7
in this one case? is clangd-7
fully LSP-UTF-16 compliant? Didn't you tell me it only started being so in version 8?
But are you testing with clangd-7 in this one case? is clangd-7 fully LSP-UTF-16 compliant? Didn't you tell me it only started being so in version 8?
This particular issue with the function doesn't care whether things are utf-16 or not. What I am describing here is the fact that it functions incorrectly when server would give the function a value higher than length of line. The example errors with end-of-buffer
and actually, if you add a second line with some text this function moves to next line. Following steps above, but changing text to:
const char write_data[] = u8"ππππ
πππππππππππππ π‘π€π²";
mor|e code here
would leave point where i put "|"
,
It should end on:
const char write_data[] = u8"ππππ
πππππππππππππ π‘π€π²";|
more code here
I would believe @sam-mccall would know more intimately whether or not clangd-7 is compliant
What I am describing here is the fact that it functions incorrectly when server would give the function a value higher than length of line.
I am not saying that shouldn't be fixed. It may need to be fixed. I will certainly consider the rest of your analysis, but first I would like you to answer my question, so I can make sense of this. In which versions of clangd do you see the behaviour? Do you see this behaviour on clangd-8 at all?
EDIT: previously I wrote that "I may need to be fixed" instead of "it may need to be fixed". It may look like a typo, but both things are obviously true :-)
Do you see this behaviour on clangd-8 at all?
I haven't encountered this being a problem on clangd-8
, no, but I have on clangd-7
. I am not totally confident that this proves the function is safe on newer servers. Not because I am skeptic, but don't feel my analysis on this matter has been thorough enough.
By not seeing this behaviour I mean the server hasn't given me the end-of-buffer
error. The possibility to enter a high number into the function and getting this error does not depend on any server at all
but both things are obviously true :-)
haha!
I haven't encountered this being a problem on clangd-8, no, but I have on clangd-7.
Thank you!
am not totally confident that this proves the function is safe on newer servers.
Nor am I, but the at least the mystery that was baffling me is cleared, or less opaque at least.
By not seeing this behaviour I mean the server hasn't given me the end-of-buffer error.
and has it been placing diagnostics on the correct column, for example? Or rather, have you witnessed, first hand, 1 or more cases where it placed them on the wrong columns?
Or rather, have you witnessed, first hand, 1 or more cases where it placed them on the wrong columns?
I have not witnessed this yet, no
This, however, is a sad small attempt to implement @sam-mccall pseudocode. It stops at line end and moves to correct column, just decrementing as we go. Not sure if faster or anything, but wanted to have a go at this :)
(defun code-point ()
(interactive)
(or (get-char-property (point) 'untranslated-utf-8)
(encode-char (char-after) 'ucs)))
(defun move (column)
(interactive)
(beginning-of-line)
(let ((u16pos column))
(cl-loop for i below column do
(if (equalp (line-end-position) (point)) (return))
(and (> u16pos 0)
(if (> (code-point) 65536)
(progn (setf u16pos (- u16pos 2))
(forward-char 1))
(progn (setf u16pos (1- u16pos))
(forward-char 1)))))
u16pos))
@theothornhill why just not make the existing eglot-move-to-lsp-abiding-column
be a little more resilient when passed bogus columns by non-UTF16-compliant LSP servers? I.e. why not this 100% untested thing?
diff --git a/eglot.el b/eglot.el
index e952b91..7799191 100644
--- a/eglot.el
+++ b/eglot.el
@@ -1050,15 +1050,20 @@ be set to `eglot-move-to-lsp-abiding-column', and
(defun eglot-move-to-lsp-abiding-column (column)
"Move to COLUMN abiding by the LSP spec."
- (cl-loop
- initially (move-to-column column)
- with lbp = (line-beginning-position)
- for diff = (- column
- (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
- 2)
- 2))
- until (zerop diff)
- do (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))))
+ (save-restriction
+ (cl-loop
+ with lbp = (line-beginning-position)
+ initially
+ (narrow-to-region lbp (line-end-position))
+ (move-to-column column)
+ for diff = (- column
+ (/ (- (length (encode-coding-region lbp (point) 'utf-16 t))
+ 2)
+ 2))
+ until (zerop diff)
+ do (condition-case eob-err
+ (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
+ (end-of-buffer (cl-return eob-err))))))
(defun eglot--lsp-position-to-point (pos-plist &optional marker)
"Convert LSP position POS-PLIST to Emacs point.
There are of course other ways to implement this resiliency, namely checking whether forward-char
argument would overflow the line end.
If I recall correctly, the relative speed advantage we got from the version of eglot-move-to-lsp-abiding-column
that I programmed some time ago had to do with the fact that it does a binary search, and so checks fewer buffer positions: it shouldn't have to check every position from 0 upto column: it takes a guess and then adjusts until it finds it is in the correct position. See #124 and #125, particularly this https://github.com/joaotavora/eglot/pull/125#issuecomment-438046601 and the following ones.
EDIT: For clarity, this means I have reasons to disagree with @sam-mccall's assertion that we "want to iterate character-by-character". We can do it like that, and it was attempted but bisecting was found to be faster.
See #124 and #125, particularly this #125 (comment) and the following ones.
I did read these, and they were why I tried to have a go at this. it seems like every iteration uses encode-coding-region
. This function, as far as i can read from its source, has a running time of at least O(n)
(It has to read the string), and this happen from n
times to log n
times. (Depending on version - newest is fastest of course). I believe this makes the current function O(nlogn)
. Please correct me if I'm wrong here.
Sams version is O(n)
, because it only scans once, at the most to EOL
.
An added benefit is that we don't create log n
utf-16
-strings, take their length, which also seems expensive, and not related to moving point.
If my point here isn't moot - it may very well be . maybe @nemethf can try some variant of this function in his benchmarks?
(defun code-point ()
(interactive)
(encode-char (char-after) 'ucs))
(defun move (column)
(interactive)
(beginning-of-line)
(let ((u16pos column))
(cl-loop for i below column do
(if (equalp (line-end-position) (point)) (return))
(and (> u16pos 0)
(if (> (code-point) 65536)
(progn (setf u16pos (- u16pos 2))
(forward-char 1))
(progn (setf u16pos (1- u16pos))
(forward-char 1)))))
u16pos))
And please forgive me my layman's complexity analysis :-)
The basic-diagnostics test fails because pyls sends the following notification to a buffer of 23 characters:
server-notification Sat Jan 11 12:30:22 2020:
(:params
(:uri "file:///home/nemethf/src/eglot/py-basic-diagnostics/main.py" :diagnostics
[(:source "pyflakes" :range
(:start
(:line 0 :character 13)
:end
(:line 0 :character 37))
:message "invalid syntax" :severity 1)])
:jsonrpc "2.0" :method "textDocument/publishDiagnostics")
This notification is perfectly in line with the specification.
The new benchmarks show that the patch JoΓ£o proposed doesn't really increase the execution time, whereas Theodor's solution is quite slow. I guess this is because encode-coding-region is in C, so it's cheap to run. However, isn't it possible for JoΓ£o's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?
I also see a potential problem with the usage of forward-char
, according to the help: "Depending on the bidirectional context, the movement may be to the right or to the left on the screen." Can an Arabic comment in a source code have a right-to-left context?
test case | fn | result |
---|---|---|
short ascii | eglot-move-to-column | 0.042893 0 0 |
short ascii | move-to-column | 0.091771 4 0 |
short ascii | eglot-move-to-lsp-abiding-column | 0.220460 13 0 |
short ascii | fix-1 | 0.380529 24 0 |
short ascii | joaos-fix | 0.246880 13 0 |
short ascii | theodors-move | 0.228392 0 0 |
short non-ascii | eglot-move-to-column | 0.043326 0 0 |
short non-ascii | move-to-column | 0.414419 4 0 |
short non-ascii | eglot-move-to-lsp-abiding-column | 1.286369 82 0 |
short non-ascii | fix-1 | 1.607467 108 0 |
short non-ascii | joaos-fix | 1.382975 82 0 |
short non-ascii | theodors-move | 2.960463 0 0 |
long ascii front | eglot-move-to-column | 0.044885 0 0 |
long ascii front | move-to-column | 0.317910 4 0 |
long ascii front | eglot-move-to-lsp-abiding-column | 0.478651 17 0 |
long ascii front | fix-1 | 1.458539 125 0 |
long ascii front | joaos-fix | 0.504161 17 0 |
long ascii front | theodors-move | 0.840904 0 0 |
long ascii end | eglot-move-to-column | 0.041956 0 0 |
long ascii end | move-to-column | 1.965042 4 0 |
long ascii end | eglot-move-to-lsp-abiding-column | 2.849772 106 0 |
long ascii end | fix-1 | 3.776413 214 1 |
long ascii end | joaos-fix | 2.878306 106 0 |
long ascii end | theodors-move | 15.581334 0 0 |
long non-ascii front | eglot-move-to-column | 0.043761 0 0 |
long non-ascii front | move-to-column | 0.513725 4 0 |
long non-ascii front | eglot-move-to-lsp-abiding-column | 0.672835 17 0 |
long non-ascii front | fix-1 | 1.937640 152 0 |
long non-ascii front | joaos-fix | 0.707602 17 0 |
long non-ascii front | theodors-move | 0.889215 0 0 |
long non-ascii end | eglot-move-to-column | 0.043063 0 0 |
long non-ascii end | move-to-column | 1.987880 4 0 |
long non-ascii end | eglot-move-to-lsp-abiding-column | 2.924919 106 0 |
long non-ascii end | fix-1 | 4.167515 241 1 |
long non-ascii end | joaos-fix | 2.956717 106 0 |
long non-ascii end | theodors-move | 15.707058 0 0 |
I did read these, and they were why I tried to have a go at this. it seems like every iteration uses encode-coding-region. This function, as far as i can read from its source, has a running time of at least O(n) (It has to read the string), and this happen from n times to log n times. (Depending on version - newest is fastest of course). I believe this makes the current function O(nlogn). Please correct me if I'm wrong here.
I don't have any reason to believe you're wrong, and n log(n)
seems to ring a bell here. It's been more than a year and I dont' remember all the details, but I remember having being convinced that we had to use encode-coding-region
because that's the only function in Emacs that "knows" how to encode in all circumstances, circumstances that I'm not 100% familiar with and which will certainly include some corner cases. So, given that assumption, which I'm happy to see successfully disputed and defeated (probably depending on your knowledge of UTF-16 things, which is surely greater than mine), the cheapest version I came up with algorithmically was the binary search.
I don't understand @nemethf's benchmarks super-well, but they seem to indicate your O(n) version is slower for some reason.
There are a lot of tests there in #125, though it was so specific that me and @mkcms put them in a branch. If you can come up with a more correct and faster version, please do!
In the meantime, @nemeth, if I understand correctly, my fixed version worked and has identical performance to the version in master, right?
If so, I request that you either fix it to avoid the condition-case (if with it you can make it simpler, that is), or let it be as it is. Then commit it, and push it to a side branch. Let Travis run on that branch. If there's a mistake, it is probably because of non-compliant pyls, so let-bind the eglot.*column
pair to the old version for those tests only. Do this is two pretty commits so we can keep track of the history. Post the resulting side-branch here and we'll take a final decision.
If at any moment you discover another problem, post here.
I also see a potential problem with the usage of forward-char, according to the help: "Depending on the bidirectional context, the movement may be to the right or to the left on the screen." Can an Arabic comment in a source code have a right-to-left context?
I can sort of see the right to left thing, but it could be a question of later using backward-char
instead of forward-char
. Or maybe forward-char
already has that idea built in (who says "forward" is left-to-right?). Anyway, let's forget about that for now, I don't think it changes the fundamental nature of the problem, it's just a symmetry quirk or something. Again, happy to be contradicted, but let's not let it hold this up while the contradicting happens :-)
And thanks a lot, both of you, for all this work.
However, isn't it possible for JoΓ£o's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?
I just read this: It is possible, of course, when the server is non-compliant and is sending a bogus column for those line contents. That's precisely the reason I made the function more resilient to these "attacks".
your O(n) version is slower for some reason.
Complexity analysis is useful, but for small n an O(n) algorithm can be much slower than an O(n^n) algorithm. I think the binary search method is faster for the practical cases because encode-coding-region
is in C, and calculating the size of one character in lisp is approximately as fast as calculating it for a region in C.
my fixed version worked and has identical performance to the version in master, right?
Yes, the difference was marginal.
If so, I request that you either fix it to avoid the condition-case (if with it you can make it simpler, that is), or let it be as it is. Then commit it, and push it to a side branch.
Without further optimization, I pushed it to fix-361-attempt-2.
However, isn't it possible for JoΓ£o's patch to "overshoot" the end of the line during the binary search and stop at the end of the line when the target column is, say, (1- (line-end-position))?
I just read this: It is possible, of course, when the server is non-compliant and is sending a bogus column for those line contents. That's precisely the reason I made the function more resilient to these "attacks".
My question was more like it is possible to "overshoot" under normal circumstances, when the server is LSP conformant, but there are strange non-ascii characters, prettify-symbols-mode is active, etc?
Complexity analysis is useful, but for small n an O(n) algorithm can be much slower than an O(n^n) algorithm. I think the binary search method is faster for the practical cases because encode-coding-region is in C, and calculating the size of one character in lisp is approximately as fast as calculating it for a region in C.
Notice that from your "short" to your "long" version, the difference between the samples you took for the two algorithms increased, not decreased. However, regarding complexity, what you say holds true in general, and if that reasoning applies here, it's easy to test: just benchmark both algorithms with a ridiculously long line. If your algorithm indeed holds an asymptotic advantage as you propose, then it should become evident sooner or later. Otherwise you made an error somewhere in your complexity analysis.
Regardless, we are dealing with real code and real systems, so while I am of course curious to know the results of the above experiment, I still think that in the meantime we have a clear winner, at least for "normal" length lines (that the function was originally tested against).
My question was more like it is possible to "overshoot" under normal circumstances, when the server is LSP conformant, but there are strange non-ascii characters, prettify-symbols-mode is active, etc?
That is very difficult to know, i.e. it is very difficult to prove a negative. It is much easier for you or anyone to show a single case of misbehaviour (perhaps writing tests), than for me to prove that no possible unforeseen interaction can ever make it misbehave. All I can say is that I designed the function to play along with compliant LSP servers: if a server mentions a column in a line that indeed has that column, it should be safe to travel less than that column's width and re-measure the distance travelled. If the server is lying, then indeed we might overshoot (and that's why I added the condition-case
). But if the server is lying then it isn't a compliant LSP server.
When modifying a line like this:
if you try to append some text after closing quote ("), eglot tells LSP servers that there was a change at offset 52, which is wrong as those emojis are encoded with 4 utf-8 bytes, therefore each results in 2 codepoints. the correct offset should've been 72.
This results in synchronisation problems regarding the file contents between editor and lsp servers.