Closed dancojocaru2000 closed 2 years ago
Here are some use cases and corner cases around nested language handling with a stalking horse to stimulate discussion. I'm sure there are others.
Maybe the following flow would work:
"nestedLanguages": {
"someTokenType": { "languageId": "nested-language-id" },
...
}
{ "method": "decode-nested-language-text",
"text": "foo &= bar"
}
{
"text": "foo &= bar",
"decodePositionMap": { ... },
"languageId": "different-nested-language-id"
}
If a set of documents contains multiple nested language tokens that decode to the same textual content,
A client may cache the results of unwrap-nested-language-text requests so that edits to a nesting document that do not affect the textual content of a nested language token may not cause unwrap requests.
Actions that highlight text or move the cursor may need to work through nested languages.
<span onclick="// line comment console.log("Hello, World!")">
The text of the onclick
attribute above might decode to
// line comment
console.log("Hello, World!")
Simple operations like pairing parentheses require mapping token positions in a nested code document to actual positions in the nesting document.
One way to handle this is for decode-nested-language-text responses to include, at a minimum, a mapping from Positions of characters that do not decode to exactly one character to the number of characters they decode.
These could be packed into int[]
using a similar scheme to the semantic tokens data.
// line comment console.log("Hello, World!")
The example above might decode to [0, 14, 5, 1, 0, 15, 6, 1, 0, 19, 6, 1]
since
at position [0, 14] consists of 5 characters and decodes to 1"
consists of 6 characters and decodes to 1"
consists of 6 characters and decodes to 1Actions that edit nested text, like refactoring may need to re-encode text.
For example, in
<button onclick='console.log("Hello")'>
a change that applies lint rules to normalize quotes in console.log("Hello")
to console.log('Hello')
might need to re-encode so that the HTML becomes
<button onclick='console.log('Hello')'>
It is not always straightforward to re-encode program text, so re-encoding requests may fail as when removing parentheses around the array access below which would cause the ]]
to merge with >
and be interpreted as ]]>
, an end to the CDATA section in the embedding document, instead of as tokens in the embedded document.
<svg><script>//<![CDATA[
if ((arr[arr[i]])>0) { ... }
//]]></script></svg>
It is probably not possible to re-encode with minimal changes in all cases, as in data:image/svg+xml;base64,...
where the nested content is textual but nevertheless includes a transform such that most of the characters after a re-encoding that changes decoded text length will change.
How do chunks of nested language text use macros like cpp's __FILE__
and __LINE__
and Swift's #line
that depend on file name and position information, interact with untitled documents (step 5 above)?
Should something allow attaching the position of the nested language token to the untitled document?
Does this require lots of re-parsing on inserts into the embedding document before the nested language token?
FWIW, in Eclipse IDE, one can defined derived ProjectionDocument from a master document. This allows mapping of subparts of the document. This is typically used for folding, but IIRC some tools used to leverage it for eg SQL assistance in .java files.
One possibility, instead of sending annotations, would be that the LSP specifies that a server can send "projected" documents that consist of subparts of the master document + a mapping + some info about the language; and that client process such projected documents with the appropriate language server.
One benefit is that the LS could decide of treating blocks independently or together (eg if declaration in one of the blocks can be used by some other blocks). Basically doing a 1-1 or 1-N mapping between documents and blocks.
One difficulty would be how to express derived/projected documents as URIs, since it's the only thing LSP understand. I imagine it could be some extension to the existing TextDocumentItem
@mickaelistria
One possibility, instead of sending annotations
I think the reference to annotations in the original was an example of syntactic cues an embedding language might use to indicate an embedding. I don't think it was a suggestion about how different LSP agents communicate.
@mickaelistria
I think your larger point that the LS for the embedding document is the source of the embedding relationship is a good one.
It seems like the kind of thing that might not be realized until later stages. For example, a DSL might only be recognized as such after imports are resolved as in
// javascript
import { someDomainSpecificLanguage } from '...';
let x = someDomainSpecificLanguage`
source in domain specific language here
`;
The embedding relationship may only be apparent after the langserver has some information about the imported identifier someDomainSpecificLanguage
.
workspace/semanticTokens/refresh should suffice to cause a re-request of token information where a run of whole tokens correspond to a block.
I will close the issue (but feel free to continue the discussion). I think LSP should not promote a model how to do this. I think both using forwards or embedded services is a valid solution.
What we are working on in LSP is to
This will allow servers to implement a forwarding model for embedded languages. However it will not be a general solution for embedded languages
Hello! I want to propose the ability for a language server to instruct a client to handle part of a file as if it's another language.
The biggest usecase would be embedding a language like HTML in a constant string in another language. That way, as part of the string, code completion and the like will be provided for the other langauge. (See example at the end).
The way the user informs the language server that this is desired would be language specific. In a dynamic language, a comment could be used. In a static language, something like annotations could be used.
The client could potentially handle this by treating the part of the file with different language as a separate "virtual" file, and allowing the other language server to operate on the "virtual" file.
Optionally, arguments could be specified in order to provide useful info to the other language server. An example would be where to find the JSON Schema for a JSON string.
Example in pseudo-C#:
Example in JavaScript:
Example in Ruby: