microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
164.19k stars 29.29k forks source link

Same extension in different language contributions #135965

Closed maziac closed 3 years ago

maziac commented 3 years ago

vscode 1.60

Hi,

I think there is a design flaw on the language id contribution from extensions. It is possible to declare a list of files that belong to a certain language in package.json.

Now suppose that extension A defines a languageIdA with e.g. the extension "*.asm" and extension B defines another languageIdB with the same extension. To what language will a file.asm belong if both extensions are installed. Is it that the last installed application wins.

Is the "extensions" list in "contributions" only an initial list?

alexdima commented 3 years ago

@maziac This is intentional. VS Code does not wish to become a standards body with a committee that gets to decide by vote or by other democratic means which file extension belongs to which language id. That is why we allow extensions to contribute the mapping from file extension to language id.

It is absolutely possible for two extensions to conflict. We currently leave it up to our users to uninstall one of the conflicting extension or leave it up to the extension authors to discuss among themselves and find agreement.

Ultimately, the file extension -> language id is a setting that can be configured by each user in their user settings or by each project in its workspace settings.

maziac commented 3 years ago

I understand. But what is the algorithm that decides which extension wins. Is it really the last installed one?

alexdima commented 3 years ago

Currently, extensions are sorted in the following order:

For two user installed extensions, I think that the extension with the identifier that is alphabetically after would end up wining, but I have to be honest, I didn't test this.

maziac commented 3 years ago

OK. One more thing that came to my mind. What if the user would like to install and use 2 extensions, both for the same extension, but with different language IDs?

From what I understood this is not possible.

What is the advise for the extension developer?

Here my real word problem: My extension maniac.dezog defines language id "asm-collection" which includes ".asm". Another extension e.g. z80-macroasm defines language id "z80-macroasm" which also includes ".asm". Both extensions could be used at the same time. How should an extension developer solve this problem?

alexdima commented 3 years ago

Users can use the language picker (from the status bar) to change the language id for each file individually or can decide if .asm should point to asm-collection or z80-macroasm using the "files.associations" setting at the user level or at the workspace/folder level.

maziac commented 3 years ago

Yes, but user cannot decide to use both extensions in the same workspace.

alexdima commented 3 years ago

@maziac I have to say I am now confused and no longer understand what this issue is about.

Are asm-collection and z80-macroasm two distinct languages or the same language? My assumption has been so far that these are two distinct languages and that this is a problem of file extension -> language id association. Is that not the case? Are asm-collection and z80-macroasm the same language with the same specification? If that is the case, then please communicate directly with the other extension author and arrive at a common identifier that both extensions can use.

maziac commented 3 years ago

Seriously? *.asm is a common file extension for assembler files. I.e. a lot of extensions for assembler files will use that. It seems quite unrealistic to me that the extension developers can handle that. But OK, I understand, vscode doesn't care.

But please a last question here regarding the handling of the extension and the language id.

The situation: User has assigned .asm to language id A. Extension B defines language id B which also includes .asm but is not installed yet.

A few cases:

  1. User installs extension B: what happens to *.asm? Is it still assigned to language id A?
  2. User installs extension B and also assigns .asm to language id B. Now user deinstalls extension B. To what language id will .asm belong now? Will it fall back to language id A?
alexdima commented 3 years ago

Please let me try to explain things from a different angle. .java is a file extension which is not really ambiguous. (Almost) everyone agrees that .java files contain text meant to follow the Java language syntax as defined in the Java Language Specification. However, even for .java files, it is ambiguous which version the .java file is meant to follow. Java SE 17 is similar, but quite a different language than Java SE 6 and not all programs that are valid Java SE 6 programs are valid Java SE 17 programs. VS Code opts to sacrifice correctness in pursuit of a sane solution. Instead of defining 17 java languages (which would be the truly correct way to approach these distinct, incompatible language versions), we simplify things and agree there is a single language id, java. The grammar definition for java is written loosely and will colorize in a decent way source code targeting any java version. We let it up to the extensions to read project files (like pom.xml files) from the folder/workspace and decide which exact Java version the source code is targeting and then validate things accordingly.


We want that all our users are empowered to configure VS Code to their liking. So if there is a small minority of people who really dislike this approach, we offer the setting called files.associations. So it is possible that for project A to define in its .vscode/settings.json that .java files are java17 and for another project B to define in its .vscode/settings.json that .java files are java6, e.g. "files.associations": { "*.java": "java17" }. That would normally be accompanied by the user also writing an extension that defines the strict colorization grammar to use for java17, or writing another extension to hook in the eclipse LSP support for java17, etc. Furthermore, it is also possible for people to have projects where .java files are actually C++ files, e.g. "files.associations": { "*.java": "cpp" }. AFAIK the C++ standard does not forbid using #include directives with arbitrary file names and file extensions, as I do for example here.


.asm is a file extension which is ambiguous. This is unrelated to VS Code. The ambiguity stems from everyone using the .asm file extension, regardless of the targeted architecture's machine code instructions. We don't want to ship out-of-the-box with a mapping from .asm -> ${languageId} because we do not want to claim authority over a domain which is not ours. We do not want to start arguing that .asm should point to x86-assembly because most .asm files on the internet are targeting that architecture's machine code instructions. This is not because we don't care, it is simply not our place to decide this. Similarly, how we don't decide that .java files are Java SE 17 or Java SE 6, we push out the decision to extensions or to the individual project. Each project can define in its .vscode/settings.json what language its .asm files points to. e.g. "files.associations": { "*.asm": "x86-assembly" }.


To sum up, if a project uses .asm files to write programs targeting the Zilog Z80, then the project can define that in its .vscode/settings.json: "files.associations": { "*.asm": "z80-macroasm" }. Your extension, z80-macroasm can then be recommended using .vscode/extensions.json: "recommendations": ["mborik.z80-macroasm"]. Your extension can then define the language grammar used to parse the z80-macroasm language and fine-tune that to be as correct as possible.


Finally, on to the files associations contributed by extensions. These act as fallbacks. In case a project does not define what .asm files are, then extensions can give hints to VS Code as to what the language is. These are just hints, because nobody (extensions nor VS Code) can actually say what .asm files are. It is only the project itself or the humans working on the project that would know what architecture/machine is being targeted.

maziac commented 3 years ago

Thanks for the comprehensive answer. But would it still be possible to answer the 2 cases in my question. This was not a rhetorical, but a real question.

alexdima commented 3 years ago
  1. yes, the language id will be the one defined by the user using files.associations. The user configuration takes precedence over the associations contributed by extensions. There is one case where extension B could "take over", and that is if extension B uses API to change the language of a text document (vscode.languages.setTextDocumentLanguage).
  2. if the files.associations points to an unknown language id, then the files will be opened as plain text. e.g. "files.associations": { "*.html": "bar" } will open as a plain text file.
maziac commented 3 years ago

Thanks.