microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
163.71k stars 29.08k forks source link

Document guidelines to enable language features for embedded languages #47288

Closed aeschli closed 4 years ago

aeschli commented 6 years ago

In VSCode each file is associated with a language. Language supports such as code completion, hovers, are contributed to that language. This is ideally through a language server. When a language allows to embedded snippets of an other language (e.g. CSS in HTML or HTML in PHP) there are various techniques that a language server can use:

The first approach has the following advantages

In either case the embedded content needs to be escaped according to the owner language. E.g > needs to be &gt.

aeschli commented 6 years ago

@aeschli Do you think that an extension can make the html part in php get the same behavior as the standalone html file? including let all the plugin for html work fine in that part? Extension system is really important as we all know, but not all the features are suitable for plugins. What we actually need is parse one file to multiple language and treat them separately. This should be a built-in ability for a code editor. And, all the PHP files are actually HTML, with a special HTML tag. Only the content in this tag should not be regarded as HTML. It is just a language embedded in HTML and can't work without it (except CLI). So the question is not about HTML in PHP, HTML is the main container. And also lots of language can be embedded in HTML, such as css javascript vbscript svg perl java, I don't think any extension can do this without official support.

@popcorner PHP files are actually not adhering to HTML syntax: <?php is not valid in pure HTML. Although it seems like PHP is embedded in HTML, the syntax is defined by PHP so it's actually 'HTML' inside PHP. Same with templating languages like Smarty:

<div>
{escape} 
This is some text I want <> escaped. 
{/escape}
</div>

(snippet from here Valid Smarty, not HTML. Try it the HTML validator) Every templating language has their own way of escaping embedded content and why the HTML language server can't just be easily used. It lacks the knowledge of the embedding language.

jens1o commented 6 years ago

The language server also implements support for the embedded language. It can do that by including libraries that provide that support. For example there are easy to use node modules for css, less, scss, html and json or more basic language supports for typescript

The problem is, how can I say I'm the master without maintaining a fork and rule them to only work in specific parts of a file? Why can't the language servers I want to control say what they are able to handle, so I do not need to worry about it and the end-user can install any extension they like, so I do not need to upload an extension douzens of megabytes big to support really any language?

aeschli commented 6 years ago

You don't need to maintain a fork. You can forward the requests. But your Smarty server needs to transform the document to valid HTML and then ask the HTML server for the result on this document. That's approach number 2. You can also tell the HTML language support to handle Smarty files by associating smarty files to HTML. But it will struggle if the file is not valid HTML as in the example above.

jens1o commented 6 years ago

You can forward the requests.

I'm looking forward to an example code. ;) And somehow you didn't answer on my second question. How can I know, when I want the users to install extensions without any more config to know why parts can be handled by which server?

aeschli commented 6 years ago

I don't understand your last question. Can you rephrase?

jens1o commented 6 years ago

You can use Smarty for everything, it's not dedicated to html, css, js, java, xml, json... So in the current version I need to detect which language it actually is, but that's the downside, because I want the language servers that the user installed to give me examples of what they are able to handle, so I do not need to worry about, and it's completly modular(so I would not need to detect whether it's PHP or Smarty code).

  1. User installs an extension, which provides language support(e.g. PHP).
  2. User opens a smarty file, and the smarty-extension asks vscode for a list of language servers(content providers) and their samples(so what does a php code pattern look like?).
  3. Once the user types in something that (for example) PHP can handle, I know I can delegate the request to the PHP language server and it can handle the request, so I simply wrap it around.

That would be sooo easy and soo extensible.

I hope you understand why that concept would be so awesome and the next step.

bmewburn commented 6 years ago

@aeschli the html language server went with option 2 first for embedded languages then changed to option 1 later. Were there other advantages, in addition to those listed above, that prompted the change in direction?

@jens1o In that situation one solution could be to provide a config setting so the user can declare what the embedded language is. Then your extension can create virtual documents with the appropriate language id and forward them on.

jens1o commented 6 years ago

In that situation one solution could be to provide a config setting so the user can declare what the embedded language is.

I want to make it as simply as possible, prompting the user for this to declare is very likely to include an error or missing an exception.

aeschli commented 6 years ago

@bmewburn The main reason was really to be able to control the user experience. For example in the case of JavaScript embedded in HTML, we want to preconfigure the JavaScript language server with the dom definition files.

octref commented 6 years ago

@jens1o

User opens a smarty file, and the smarty-extension asks vscode for a list of language servers(content providers) and their samples(so what does a php code pattern look like?).

Can you clarify "samples" with examples?

Currently if you want to get HTML completions in your extension, all you need to do in your extensions are three things:

  1. Figure out if the completion position is in HTML or your-lang
  2. If it's HTML, create a virtual HTML document with the embedded content
  3. Call the command vscode.executeCompletionItemProvider.

If you are writing a language server that could handle embedded language X, things such as "getting range and content of embedded regions" (for 1 and 2) should be the language server parser's responsibility.

jens1o commented 6 years ago

Can you clarify "samples" with examples?

I mostly mean regular expressions, so to keep it as universal as possible.

I'm not an exact regex expert, but for detecting PHP sections this could be used:

/(<\?(php)?)(.|(\r)?\n){1,}(\?>)?/gim

These (more complex) patterns would be given to vscode and can be polled by each language server. Then, a simple pattern matching is used to determine the matching language server where the request will be passed to by the master one(responsible for a specific file extension).

The problem is: Languages do not need to have some kind of start- and endpoint(e.g. javascript). Thus, a fallback language determined by the master language server would be required.

octref commented 6 years ago

@jens1o There would be many problems with that approach, just off the top of my head:

jens1o commented 6 years ago

So, I do not have a better solution, yet. Do you have some better ideas?

PHP server wouldn't be able to control when the request is going to HTML and when it's going to PHP. Also passing data between LS could be very tricky, if it's not controlled by the LS.

That's supposed to be like this, because it's the job of the master language server.

LSP is chatty. You might have multiple requests going back/forth for each character entered. VS Code can't run complex regexes on the same file again and again on each document change.

Perhaps we can include sub-languages while only checking whether a specific keystroke is in of the range.

:: NOTHING, FALLBACK LANGUAGE (LAYER == 0)
<html> :: HTML LANGUAGE SERVER DETECTS START OF HTML LANGUAGE (LAYER == 1)
<body>
| :: CURSOR IS WITHIN THE RANGE OF THE HTML LANGUAGE SERVER; SO HTML IS DOING THE JOB
<p>Hello World!</p>
<?php :: PHP LANGUAGE SERVER DETECTS START OF PHP LANGUAGE (LAYER == 2)
| :: CURSOR IS WITHIN THE RANGE OF THE PHP LANGUAGE SERVER; SO ITS HANDLING THE REQUEST
?> :: PHP LANGUAGE SERVER DETECTS END OF PHP LANGUAGE; VSCODE SHIFTS DOWN AND LASTLY RECOGNIZED THE HTML LANGUAGE; THUS ASSUMES HTML (LAYER == 1)
</body>
</html> :: HTML LANGUAGE SERVER DETECTS END OF HTML LANGUAGE; ASSUMES FALLBACK LANGUAGE (LAYER == 0)

Would that decrease the cost and pressure of vscode? The only question is how we're determining and separating the languages. Is there any other solution that's not based on complex regexes?

octref commented 6 years ago

@jens1o You are also assuming that language servers always want to give full control of sub-regions to other language servers. For example, in Vue Language Server it suggest v-if at <div v|>. HTML LS wouldn't know about this v-if.

If you want to find a generic solution that should be put in VS Code (@aeschli thinks it's impossible and so do I), try to at least find a solution that could handle at least both PHP and Vue. The solution shouldn't make VS Code slow or make any LS limited in its language capabilities. If you do find such a solution I'm in all ears, and the issue for that is #1751.

Up till then, this issue's scope is for providing documentation, guideline and possibly starter template for making language servers that support embedded languages.

jens1o commented 6 years ago

@octref @aeschli May I ask at what position this backlog entry is?

octref commented 6 years ago

@jens1o I've just reworked on the Language Server Guide. This is something I have in mind. I can't promise anything, but probably sometime later this year.

jens1o commented 6 years ago

Okay, I'm working on implementing this. I've got one problem though. When I pass vscode the request, it apparently passes it back to me, then I pass it to vscode... Infinite Loop. How can I get around this and declare the specific code-part as another language, so (in my example) the Emmet provider rules it all?

jens1o commented 6 years ago

That's the code currently: https://github.com/jens1o/vscode-smarty/blob/a558eb202f91c7657904e4c39a2844529e06fb39/src/completionProvider.ts#L28-L41

aeschli commented 6 years ago

@jens1o Can you create a separate request? Thanks!

jens1o commented 6 years ago

@aeschli Where? Here? At the community slack?

aeschli commented 6 years ago

Just a new GitHub issue.

axelson commented 4 years ago

@octref I see that you closed this issue. Is there a link to the created guideline? I tried following some of the referenced issues, but was unable to find a related guide.

KamasamaK commented 4 years ago

@axelson See https://code.visualstudio.com/api/language-extensions/embedded-languages