Open sam-mccall opened 4 years ago
(Just noticed that the highlighting in the second screenshot shows the problem nicely.
VSCode's built-in identifier highlighting marks object
on lines 2 and 3, while clangd's textDocument/highlight
marks operator
on line 2 and ->
on line 3. I think not including the ->
on line 2 is a clangd bug...)
As mentioned in microsoft/vscode-languageserver-node#593 we need to change the protocol if we need to let servers know about this. Adding a selection to the params might not help since VS Code doesn't have the information right now in the API and other editors might behave the same. So having a capability looks like a good approach to have more control over this.
In VSCode, the cursor lies between characters, so ambiguity: the thing on the left or the thing on the right?
That's a VSCode issue, not a protocol issue, though right ? The protocol is explicit about which "character" is being requested (it's a row/column position, not a position bettween two columns). Why would we need to change the protocol for this? Surely this change can be made in VSCode only.
Or am i missing something important?
Adding a selection to the params might not help since VS Code doesn't have the information right now in the API and other editors might behave the same
Makes sense to me. A capability seems less complex overall, with most of the complexity lying in the server rather than client or protocol.
The protocol is explicit about which "character" is being requested (it's a row/column position, not a position bettween two columns).
I think it's the opposite:
A position is between two characters like an ‘insert’ cursor in a editor
Whatever the spec text currently says, both "line-cursor" and "block-cursor" editors exist, and it would be nice to specify something here to improve interop.
I think it's the opposite:
Quite right, ugh.
value represents the gap between the
character
andcharacter + 1
.
Oh boy, i had not noticed that. So, how do you specify the start of the line ? 0
would mean the gap between 0
(the first character) and 1
(the next character), where it's really the gap between the "theoretical" start-of-line and the 0
th (first) character.
On reflection, this model seems kinda wonky. For completion requests the character position is usually requested as the position immediately following the last char on the line, but this would indicate that the server should interpret is therefore as to between the "one-past-the-last-char-in-line" and the "two-past-the-last-char-in-line". Fortunately, either we (and most other clients) are not actually not sending the 'one-past-the-last-char-in-line" position or servers are already interpreting the completion position as if it were an actual position (not a virtual thing between 2 positions). I suspect the later.
Whatever the spec text currently says, both "line-cursor" and "block-cursor" editors exist, and it would be nice to specify something here to improve interop.
I see your point. The "between characters" model works for an insert-mode-like situation, but doesn't exist in an equivalent of "normal-mode", or where the cursor has an explicit location.
So, how do you specify the start of the line ?
@puremourning Index 0 is the left of the first character
But offset 0 means the first character in any 0-based
offset system i've ever seen.
But offset 0 means the first character in any
0-based
offset system i've ever seen.
@puremourning In this case the character does not matter because Position
is referring to the space between two characters (or in the extreme case, the start of the line or the end of the line excluding newline characters).
Or perhaps I'm misinterpreting something here?
The Position
in the spec is defined as a 0-based
offset of characters in the line:
0123456789
This is a line
The 0th
character is T
. If servers interpret offset of 0
as the space between character 0
and character 1
then offset 0
is marked here:
0 123456789
T his is a line
^
Whereas you're saying it should be interpreted as here:
0123456789
This is a line
^
Which i would largely agree is the only reasonable way this could work. But it makes the specification itself ambiguous.
Agree with @sam-mccall that editors where the cursor position is an integral actual-character offset don't have any ambiguity here. in the above cases, position 0
is clearly:
0123456789
This is a line
^
Ah, that might be other experience leaking. I've only used line cursors, so think of it in terms of cursor position. The character and the cursor position to the left of it are interchangeable with this interpretation, giving a right bias on the cursor -> character transform. I agree, there doesn't seem to be any issue here then; Position's are specified as being character offsets, so it's up to the client to decide which character (as a Position) it wants to request for. It also makes sense for the client to decide this, to make the experience consistent across servers.
We're drifting a little bit from the original issue :-)
I believe the intention is that character
is the number of characters to the left of the cursor. So the start of the line is 0 and the end of the line is len(line). AFAIK all editors (of both types) follow this, so while the spec text could be tighter, I don't think there's an actual interop problem here. I'll send a PR for the text.
@sam-mccall I'm not sure what's off topic here: the spec says that a Position is the character offset, i.e., the character itself, and that the location for Goto Definition, etc., is given in terms of such a Position. Your original question about ambiguity is resolved with this understanding.
I think the question of how to resolve a given kind of cursor to a character is a concern for the LSP client, not the protocol or the server. IMO the protocol should be view / edit model agnostic as much as possible.
@sam-mccall you're almost certainly right.
i do think this interpretation is material to this issue though.
The reason i say that is that if the specification states that a position lies between characters, even for ad-hoc requests like go-to, etc. then it's material to the discussion how that position should be interpreted. I think this issue is about helping servers 'guess' what the user's intention is in the model between-characters model, whereas actually if the model is applied consistently, then it should be possible for all clients and servers to interop without new capabilities or protocol changes. I guess i'm exploring if that's possible.
I realise that in the model where you specify the actual character position, there is no ambiguity. i'm thinking that it's the clients responsibility to remove the ambiguity from the user's perspective and the specification's responsibility to remove it from the client/server interpretation perspective.
@Aerijo
the spec says that a Position is the character offset
No, as previously quoted, the spec says: "A position is between two characters like an ‘insert’ cursor in a editor". This is in the section defining the Position structure specifically.
Your original question about ambiguity is resolved with this understanding
That's a perfectly self-consistent theory, but it doesn't match VSCode's selection behavior (see the OP). In practice this basically means the interpretation is wrong - the spec ~always follows VSCode's model, which is also perfectly self-consistent.
@puremourning
if the model is applied consistently, then it should be possible for all clients and servers to interop without new capabilities or protocol changes. I guess i'm exploring if that's possible.
Concretely this might look like "Position now represents a character" and VSCode has to sometimes subtract 1 compared to what it currently sends (assuming no dramatic changes to its internal model, which I think are unlikely). Apart from the question of whether VSCode is willing to make such a potentially-breaking behavior change, this necessarily involves the editor guessing about the significance of the characters on the left vs right of the cursor. This is heuristic, and I've had nothing but trouble where editors have needed to do this (can dig up examples if needed). So my preference is to avoid trying to patch this up.
Perhaps I'm missing something, but can't this be solved in a general way by having the protocol take a range rather than a position? A between-characters cursor can be modelled as a zero-length range, and an on-a-character cursor as a length-one range.
Using ranges instead of positions also makes the protocol richer, e.g. see the existing discussion in #377 about hovers showing the type of a selected expression that's larger than just an identifier.
OK, to be clear, I was basing my reasoning on the spec. I searched "position" in the specification, and the first result was as follows:
A position inside a document (see Position definition below) is expressed as a zero-based line and character offset.
followed by an example explicitly using "character offsets" to refer to characters
So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16.
The leading sentence of Position confirms this
Position in a text document expressed as zero-based line and zero-based character offset
... and then the second sentence throws everything established so far out completely by saying they measure the space between characters
A position is between two characters like an ‘insert’ cursor in a editor.
...but wait, there's more: in the Range specification:
A range in a text document expressed as (zero-based) start and end positions. A range is comparable to a selection in an editor. Therefore the end position is exclusive.
it says that an end position (represented with a Position) is exclusive. Consider the range [[0, 0], [0, 1]]; do we agree that this is the first character of the first line? If so, then it is exclusive under the character model (doesn't touch the index 1 character), but not exclusive under the between-character model (does touch the index 1 space between characters).
So I hope you understand why I was quite confident about Positions being specified as character offsets, and not spaces between characters :slightly_smiling_face:. Though I do feel better about my original comment now.
It seems then that it is the spec itself that is inconsistent then, probably because it is natural to think about Positions in terms of between characters (to represent an I cursor location / inclusive selection range), and also because they are used to represent characters for things like Goto Definition. I feel the spec is still overwhelmingly in favour of them being character offsets, so I feel my conclusion above is still relevant, but there is definitely inconsistency there.
However your point about needing a heuristic in the editor to decide which position to ask for is interesting; a simpler example might be
f o o ) b a r
^ ^
If using an I beam cursor, the editor would need to pick which way to bias it to a character offset. The user would expect a particular one to be resolved, but there is no way to communicate which to the server without an editor heuristic (e.g., favour word characters over punctuation). Otherwise, if the editor always biases the I beam cursor in a particular direction, it may ask for the )
character (which is probably not useful), despite the user thinking it was part of the end of foo
/ start of bar
.
Any update here? Seems like we could update the protocol to optionally include a range for selections?
b56a40f8667f54dfda8a180dab52495da4dfb52f seems to imply it's up to the client to decide what to send back if there's a selection.
b56a40f seems to imply it's up to the client to decide what to send back if there's a selection.
Right, but if the protocol only specifies a point, then the information channel is lossy: the client can decide to send the starting point of the selection or the ending point, but not both.
For some use cases, it would be useful if the protocol allowed the client to send both endpoints of the selection.
https://github.com/microsoft/language-server-protocol/commit/b56a40f8667f54dfda8a180dab52495da4dfb52f is to clarify the current situation. That doesn't mean that we can't extend the protocol with the selection if we think that is necessary.
Go-to-definition (and others) is triggered on a cursor position. In VSCode, the cursor lies between characters, so ambiguity: the thing on the left or the thing on the right?
A couple of factors make this worse:
To illustrate, these screenshots produce the same
textDocument/definition
request, but want different results.In vim. This should go to
Foo::operator->
.In VSCode. This should go to
object
.It'd be nice to have some more information so language servers can make better decisions. Some ideas:
I guess this is mostly an issue with languages that have overloaded operators. constructor and functor calls are common examples in C++.