Consider making node names exports rather than strings

marijnh commented 3 years ago

This would make using them less error-prone and allow autocompletion and such.

It does make some things that are currently allowed, such as multiple productions mapping to the same node name, problematic.

Also interfaces like NodeType.match, which map from node names (and optionally space-separated lists of them) to other values become a lot more awkward, because you'd need to import each node and put it between square brackets in the object literal, and can't list multiple types in a single property in a straightforward way anymore. ({[`${Identifier} ${Number}`]: x} doesn't really roll of the tongue).

Group names would then probably also need to become numbers, for consistency.

Timmmm commented 3 years ago

I agree, however I think it would be easier/better to use Typescript types instead of making them exports. I.e.

type NodeType = "Expression" | "Declaration" | ...;

I think the best way to do this is:

Parameterise the parser over the node type, i.e. instead of:

export interface SyntaxNode {
  /// The name of the node (`.type.name`).
  name: string

Do

export interface SyntaxNode<T extends string> {
  /// The name of the node (`.type.name`).
  name: T

Admittedly you have to add it in quite a few places but it would be worth it.

Change nodeNames from a string to an array of strings. I'm not sure why it uses a string in the first place really. I doubt that it is faster since the only thing you do with node names is compare them so surely it is easier for V8 to deal with separate string constants than arbitrary substrings of a bigger string? In other words, instead of:

nodeNames: "⚠ JsonText True False Null Number String } { Object Property PropertyName ] [ Array",

Do

  nodeNames: ["⚠", "JsonText", "True", "False", "Null", "Number", "String", "}",  "{", "Object", "Property", "PropertyName",  "]",  "[",  "Array"],

Then either have it generate a Typescript .d.ts file, or better yet add support for proper Typescript output. If you do then you can do this little trick:

const nodeNames = ["⚠", "JsonText", "True", "False", "Null", "Number", "String", "}",  "{", "Object", "Property", "PropertyName",  "]",  "[",  "Array"] as const;
type NodeName = typeof nodeNames[number];

It infers NodeName as "⚠" | "JsonText" | "True" | "False".

Timmmm commented 3 years ago

Here's a branch where I added the node names type as a generic parameter: https://github.com/Timmmm/lezer-tree/blob/parameterised_node_names/src/tree.ts

It does add a ton of <NN extends string = string> everywhere which is a bit verbose. I'm not sure of any simpler way that is still correct though. Also I had to use any in a couple of places but nothing too bad.

marijnh commented 3 years ago

That's definitely not what I had in mind, though. I don't really see many ways in which this would be usable or helpful. What's the idea behind it?

Timmmm commented 3 years ago

It would do exactly what you said:

This would make using them less error-prone and allow autocompletion and such.

Is that not what this issue is about?

marijnh commented 3 years ago

Not really. I was considering exporting the node IDs as bindings. I have no intention to type-parameterize half the library on node name strings.

It's rarely a good idea to jump in and start a big refactor of someone else's library without first discussing the direction you're going in.

marijnh commented 4 months ago

In the end, the extra complexity needed to do this properly, the fact that the current API is stable now, along with the convenience we get from strings (code being able to stuff multiple node names in a property name in stuff like match or styleTags) mean that this ship has sailed and we're going to stick to strings. You can get the node IDs from a grammar's term files and work with those if you prefer, but name strings will remain the primary way to identify node types.

lezer-parser / lezer

Consider making node names exports rather than strings #8