tree-sitter / tree-sitter-typescript

TypeScript grammar for tree-sitter
MIT License
361 stars 108 forks source link

Absorb/eliminate tree-sitter-javascript? #191

Closed mjambon closed 6 months ago

mjambon commented 2 years ago

Maintaining tree-sitter-typescript as an extension of tree-sitter-javascript tends to involve extra work compared to a grammar written from scratch (#189, #190 + personal experience with semgrep).

Since typescript (tsx mode) is compatible with javascript and jsx, what's the value of maintaining a parser that supports only javascript? Do we have users who would be bothered if we only provided the tsx dialect as a javascript/jsx parser?

/cc @ruricolist

mjambon commented 2 years ago

@dcreager @maxbrunsfeld @patrickt thoughts on eliminating the javascript-only grammar?

maxbrunsfeld commented 2 years ago

I still think it's valuable to have the JavaScript language explicitly implemented as its own smaller, simpler grammar. JavaScript is relatively stable, and has very precisely documented structure, whereas TypeScript is still constantly changing and has a much less thorough spec.

Are there other ways that we could make it easier to bump the JavaScript dependency the TypeScript grammar? I'm open to replacing the package.json dependency with something that's not tied to nom, like a git submodule.

mjambon commented 2 years ago

I think we need to ensure, at the time of committing a javascript fix, that the fix also works for typescript.

If we used tree-sitter-javascript as git submodule of tree-sitter-typescript, we'd need the javascript CI to run the javascript tests using the typescript parsers. The flow would be something like:

  1. contributor fixes something in tree-sitter-javascript, adds a test
  2. CI tells them that the test didn't pass with the typescript or tsx parser
  3. they clone tree-sitter-typescript, update the tree-sitter-javascript submodule, and fix things until local tests pass

Maybe the git submodule solution is fine if (2) succeeds most of the time. I think step (3) is a bit too much to ask unless it's exceptional.

Another solution would be to merge the javascript and typescript repos but keep the grammars separate as they are today. This would be simpler for contributors. Running tests locally would automatically run the javascript tests with the typescript parser, which is more direct than having to rely on CI for it.

hendrikvanantwerpen commented 2 years ago

I recently mentioned this idea in our team, and one of things that was mentioned was that there are more "variants" of JS. Flow was mentioned as an example, which adds its own syntax (unfortunately even without changing the extension 😦).

In light of that, I think having a separate JS grammar is good, because it allows deriving other variant next to TS. If there would be more variants next to TS, having them all in the same repo might make developing the JS grammar harder, as issues with the derivatives need to be fixed immediately as part of the JS change.

mjambon commented 2 years ago

@hendrikvanantwerpen I'm optimistic about a repo merge because the technical expertise needed to work on any of these projects is basically the same. As a result, we'd get a bigger contributor pool and happier contributors - well, at least the TypeScript and Flow contributors would be happier, and I'm sure they would be glad to assist in case of any difficulty with a new JavaScript feature.

maxbrunsfeld commented 2 years ago

I think we need to ensure, at the time of committing a javascript fix, that the fix also works for typescript. If we used tree-sitter-javascript as git submodule of tree-sitter-typescript, we'd need the javascript CI to run the javascript tests using the typescript parsers.

I would rather treat tree-sitter-javascript as an independent project, and have the TypeScript grammar depend on a specific version of it, similar to the normal workflow for dependencies. Generally, it's fine to commit changes to tree-sitter-javascript as long as the tree-sitter-javascript tests pass.

The TypeScript grammar doesn't necessarily have to always immediately pull in the most recent commit of tree-sitter-javascript. We can choose when to bump the version of tree-sitter-javascript that tree-sitter-typescript uses. And at the time of bumping the dependency, we need to make sure that the TypeScript tests pass.

dcreager commented 2 years ago

I agree with Max on this — I think this issue is showing that we need to do a better job of assigning version numbers to the JS grammar, and to have more precise version ranges (or even specific pinned commit SHAs) on the TypeScript side. If we're making changes to the JS grammar that would break the TS grammar, those changes could also presumably break other consumers of the JS grammar, and need to communicated via appropriate version numbers.

mjambon commented 2 years ago

I think this issue is showing that we need to do a better job of assigning version numbers to the JS grammar, and to have more precise version ranges

Who would do this work and when?

maxbrunsfeld commented 2 years ago

I personally think the git sha is the best way of locking the version down, either in the package.json, as we have it now, or just via a git sub module.

I don’t think there’s a meaningful way to encode grammar changes in semver. Strictly speaking, almost any change would be a major version bump. Managing the version adds a bunch of ceremony to the development process, and I don’t think it adds much value as compared to just using git revisions directly.