microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
160.38k stars 28.1k forks source link

[grammars] provide alternative to TextMate grammars #216

Open tomq42 opened 8 years ago

tomq42 commented 8 years ago

TextMate isn't sufficient for many languages.

We have been integrating in to the lower level, in the src/vs/languages directory and using Modes.IState and supports.TokenisationSupport. There needs to be a way of writing an extension that can do this, which at least currently there doesn't seem to be,

Thanks.

Tyriar commented 8 years ago

@aeschli I'm interested in the thinking behind moving away from our own tokenization in favor of tmLanguages.

aeschli commented 8 years ago

@Tyriar For performance reasons we want the tokenizers to run in the render process. As we don't want user code to run in the render process we went for declarative tokenizers. First we had Monarch support as well, but before the API deadline decided to go fully for text mate to keep the API simple. Also, in order for our theming support to work well, we want all tokenizers to emit TextMate tokens.

We are aware of the limitations and problems of TextMate, and we are open to allow other types of tokenizers, but no work is planned in this area at the moment.

tomq42 commented 8 years ago

That's a shame. It's also slightly "unfair". In the sense that it means that Microsoft can write language modes that can do things that other people can't....

There would be zero chance of a pull request to the core of vscode being accepted for a new language mode, so we are left being unable to write a sensible language mode for vscode. We have no problem in Eclipse, there's no problem in writing similar things there, they have no issue with things running in the render thread.

At least in Monarch there was a "pop state" facility, which as far as I know has no equivalent in TextMate. In Monarch you could shift states to an explicit state, and then "pop". So you could write "subroutines". The facility made it least possible to write our language mode in Monarch, even if it was much harder work than doing it the low level way, which is what we ended up doing.

tambry commented 7 years ago

Tokenizing languages where a single token might be split onto multiple lines is near impossible (without very complicated workarounds) using TextMate. (see https://github.com/Microsoft/vscode-textmate/issues/32)
Monarch would be a huge improvement, allowing better language support for more complicated/nuanced languages. Though being able to write a tokenizer using an API would be even better.

EvgeniyPeshkov commented 5 years ago

Hello everyone. I've developed and published syntax highlighting extension based on Tree-Sitter. It provides universal syntax coloring engine for almost any programming language (currently, C and C++ are supported OOTB). Constructing entire syntax tree, Tree-sitter efficiently overcomes all limitations of built-in TextMate grammars. It's very easy to add support for a new language. I'm planning to write HowTo in the next couple of days, but you can figure it out from source code, that is very simple and straightforward. Contributions are welcome. I've been using it by myself for a month, so I suppose it's ready for public use. At least extension can be useful until VSCode core provides stronger syntax parser.

You can install it from VSCode Marketplace. Or download .vsix package from GitHub page and install it manually. Please note, that extension published in VS Code Marketplace will only work in Windows-x64. For other operating systems, please download pre-compiled .vsix package. This will be fixed in the near future with one of the next updates. Alternatively, you can build extension from sources.

texastoland commented 2 years ago

We are aware of the limitations and problems of TextMate, and we are open to allow other types of tokenizers, but no work is planned in this area at the moment.

@aeschli This was 5½ years ago 👆🏼 I understand TextMate is probably as much as a thorn in your side as for extension authors judging by some of the @mjbvz's logged issues. I'm happy to document its pain points but I imagine you already have a query in your GitHub Issues Notebooks somewhere.

Here's my understanding of the current state of things:

After a week of struggling with microsoft/vscode-textmate#32 and practically nonexistent documentation apart from a blog post from 2014 ... could we pretty please with a cherry on top have an update on this issue?

texastoland commented 2 years ago

Continued from https://github.com/microsoft/vscode-textmate/issues/117#issuecomment-920925771:

[@jeff-hykin] I spent years on a library (which I finally published just last week) to make it way less painful.

If it works for you that's great. For me most of your use case is solved by using YAML instead of JSON (like Sublime but it's frustrating that there's a compile step for Code) and the metaprogramming facilities of embedding match content in scope names (using them like CSS classes to inject other grammars) or YAML 1.1 merge keys.

How YAML looks (syntax highlighting available for embedded regexes):

scopeName: inline.template-fsharp-highlight.reinjection
injectionSelector: "L:meta.embedded"
patterns:
  - name: string.quoted.triple.fsharp.template.fsharp.substitution
    contentName: meta.template.expression.fsharp
    begin: |
      (?x)    # Ignore whitespace
      (?<!\{) # Not after brace
      \{      # Literal brace
      (?!\{)  # Not before brace
    end: |
      (?x)
      (?<!})
      }
      (?!})
    captures:
      0: { name: keyword.symbol.fsharp }
    patterns:
      - include: source.fsharp

I've seen at least 2 projects that rolled their own grammar generators (the original Reason syntax and your own Better Shell Syntax). There's even a more interesting compiler (currently with documentation, online REPL, and CLI but no extension yet) to transpile an entirely new syntax with a Sublime-like stacking context into TextMate.

But to me it's all infuriating. Code is progressive in so many ways but not only regressive in a core component of literally any text editor but now unresponsive about it. The https://github.com/microsoft/vscode-textmate project is on 1 hand daunting and on the other janky and indiscernible whether it's due to TextMate's unspecified behavior or actually a bug.

Semantic tokens were a foundational step but not a solution. Most of the implementations connect them to their LSP. That's less performant than using Tree-sitter (#50140), leaves the burden on extension authors to provide a TextMate grammar when the LSP isn't available (like for a file outside a .NET project), and creates an inconsistent experience for end users in terms of coloring (whitespace significant languages like F# are most drastically affected) as well as responsiveness.

In conclusion being silent about this hurts:

  1. performance of arguably 1 of the primary functionalities of Code.
  2. extension authors who waste weeks (🙋🏼‍♂️) reinventing the wheel because of an undocumented, outdated, insufficient, and slightly buggy (although well-tested) tool.
  3. end user experience (see previous paragraph).

@bpasero @egamma Sorry to spam you but would a separate PR proposal for #50140 be more productive 🙏🏼

jasonwilliams commented 2 years ago

@aeschli do you know if there’s any current exploration into something like treesitter as a replacement for the textmate grammars we have today? It can’t just be left as it is indefinitely as it’s noticeable.

But to me it's all infuriating. Code is progressive in so many ways but not only regressive in a core component of literally any text editor but now unresponsive about it.

performance of arguably 1 of the primary functionalities of Code.

This is very true and it’s actually quite sad to see too.

It’s great VSCode has all of these fancy bells and whistles and more features than you can possibly need, but it seems to get the basics wrong when it comes to rendering the source code onto the screen. On typescript projects I see the syntax highlighting kick in a few second after the code shows up, this is a known issue but was given lower priority. I’d probably go as far to say I’d happily wave any new feature for a few months if it meant time was spent on this.

I understand there’s also a desire to fully rely on LSP for code colouring, but this just adds extra latency like @texastoland mentioned above; you would definitely need some level of caching or stale-while-revalidate before falling back to LSP otherwise it’s no better than what we have today.

I understand anything around tokenisation requires a refactor and that’s most likely why no one wants to go near it but how long can that last really? Until competition begins to narrow?

Any additional tokenizer is waiting on @alexdima's https://github.com/microsoft/vscode/issues/77140.

his last response to that thread was almost 3 years ago so I think it’s a dead end. It’s yet another thread where the maintainers have gone silent on the issue.

I did include tree-sitter in my post around VSCode performance as a whole https://jason-williams.co.uk/speeding-up-vscode-extensions-in-2022

jasonwilliams commented 2 years ago

I've had a go at integrating a different service (alongside the textmate one) which supports tree-sitter. So far it loads up fine but there's some issues having it properly instantiate tree-sitter. I think this is to do with the security policies in place.

I think its possible to have a Tree Sitter Service which can emit tokens (similar to the textmate service) and have higher-up services use that instead. Or have them use the tree sitter API wrapped in a service (for queries etc)

If anyone is interesting in helping there's a PR here: https://github.com/microsoft/vscode/pull/147648

texastoland commented 4 months ago

Until competition begins to narrow?

Switching to Zed today 🤦🏼‍♂️

Note: not a single reply from MS here and only 1 dismissive response in https://github.com/microsoft/vscode/issues/50140#issuecomment-426084826

heartacker commented 4 months ago

https://github.com/microsoft/vscode/pull/161479 SAD