sublime-treesitter / TreeSitter

Sublime Text Tree-sitter configuration and abstraction layer
MIT License
17 stars 1 forks source link

Syntaxes from `tree_sitter_languages` seem to be ignored #9

Open deathaxe opened 5 months ago

deathaxe commented 5 months ago

It appears languages from tree_sitter_languages are not taken into account, when using functions from this plugin.

While everything works as expected for python it does not on other languages such as markdown.

Steps to reproduce

  1. Install TreeSitter
  2. Restart SublimeText
  3. Open a markdown file
  4. open console
  5. run from sublime_tree_sitter import get_tree_dict
  6. run get_tree_dict(view.buffer_id())

Expected behavior

A json string representing markdown content being printed to console.

Actual behavior

Nothing is printed to console.

Additional Info

Adding "markdown" to the list of "installed_languages" fixes this issue and e.g. get_tree_dict(view.buffer_id()) returns content as expected.

Maybe "installed_languages": [] should only contain languages, manually installed, while those from tree_sitter_languages package should be always available without being listed.

kaste commented 5 months ago

Yes that's a bit sad. Unfortunately, tree_sitter_languages does not ship any information about which languages are included. And there are PRs that add more and more.

When does the dependency get updated btw. There is already a newer version (although broken so that's actually we being lucky here) than what I now have in ST.

We could hard-code which languages are supported, it is more or less a copy paste from their build.py iirc. But if we dependency gets updates regularly we maybe should automated that or (ideally) have a PR over there to add an API to expose that information.

deathaxe commented 5 months ago

Well, crawler runs daily. It appears v1.10.0 has been released for linux only.

kaste commented 5 months ago

Yeah, their release pipeline is broken. Very good, defensive programming that the crawler does not stumble here. 👏👏👏

kylebebak commented 5 months ago

Hey @deathaxe , the only languages that are installed by default are "python" and "json"

Screenshot 2024-02-03 at 5 11 33 PM

To install more languages, you can look for TreeSitter: Install Language in the command palette, to e.g. install the markdown language

Before:

Screenshot 2024-02-03 at 5 06 01 PM

After:

Screenshot 2024-02-03 at 5 06 11 PM

Once you've done this, you can go to a markdown file, go through the repro steps you mentioned above, and you won't hit any errors

Screenshot 2024-02-03 at 5 08 17 PM
kylebebak commented 5 months ago

I think documentation can and should be improved to explain this. I'll do that now

deathaxe commented 5 months ago

I am aware of the intended workflow for installing or removing languages and understand the former restriction to install a basic set of languages, to keep initial package size small.

However, what's the reason for not enabling all syntaxes, which are already shipped with tree_sitter_languages?

I see the difficulties with regards to enumerating languages, it ships, but until https://github.com/grantjenks/py-tree-sitter-languages/issues/21 is resolved, a list of initially available syntaxes could be maintained manually, no?

kaste commented 5 months ago

My cents in. I think you should check for language updates when ST starts, just like now. But you should take the always installed languages that we get from tree_sitter_languages for granted. I also don't think it very friendly to not use any of them just by setting the python setting. Typically you're happy with the shipped languages and want to add another one manually. So to speak: the available languages are the ones from tree_sitter_languages plus the one built locally. It's a union set, no?

Now there is actually no need to enumerate all these languages, I think. And you also don't need to instantiate all of them at startup. That would be a waste, as we may have tens and tens of them. Even if it's fast it would be waste. Typically you instantiate on first usage. First trying a local available language file, then one from tree_sitter_languages. Otherwise we fail.

I think you store everything in global maps/dicts. But you can also always do

@lru_cache()
def get_language(scope: str) -> Language | None
...

(or a similar get_parser(), actually don't know what the deepest useful primitive/abstraction here is)

to have a dynamic mapping/cache.

(Have to leave now, kids are coming :grin:)