ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
5.01k stars 217 forks source link

Add ability to type operation parameters & return values #450

Open TheOnlyTails opened 1 year ago

TheOnlyTails commented 1 year ago

Currently, using any semantic operations/attributes results in an any, which introduces a major risk of mistakes and typos into the codebase.

I suggest moving the semantic operation/attribute creation process into the createSemantics function, and extending the Node type of off that, instead of using [index: string]: any.

I'd be happy to help create a prototype/work alongside someone on this, since this seems like quite a big pain point for such an important feature.

pdubroy commented 1 year ago

@TheOnlyTails This is already supported — please see the documentation on Using Ohm with TypeScript.

If there's something missing from that, or you find any problems, please let me know!

TheOnlyTails commented 1 year ago

I read through the docs again, and I think you misunderstood what I meant: the type-checking when declaring semantic operations is great - the problem arises when it comes to accessing them - they're all typed as any. I'd like something like this:

grammar.createSemantics({
  // this object allows declaring semantic actions while type-checking their params and return type when called
  eval: (semanticParam: string) => ({
    Expression: (content) => //etc
  })
})
pdubroy commented 1 year ago

Sorry for the confusion, guess I was a bit distracted when I read this and didn't fully understand.

To confirm I understand what you're asking about, let's take this example:

const s = g.createSemantics().addOperation('myOp(a, b)`, {
  ...
})

You want to be able to specify both (1) the return type of the myOp operation, and (2) the type(s) of the parameters (a, b)?

It's already possible to specify the return type, see arithmetic.ts in the TypeScript example.

You're right that we're missing an ability to type the parameters — I'll repurpose this issue for that.

And thanks for the offer to help and for your suggested fix. I'll have to think about bit more this and how we can best solve it.

TheOnlyTails commented 1 year ago

While parameters are one big missing thing about the types, I'm talking mostly about the typing of actions being called. For example:

semantics.addOperation<Expression>("eval()", {
  Expression_parens: (_, expr, _1) => expr.eval() // currently returns any, should return Expression
})

While this still compiles because of the any, it's still vulnerable to misspellings and confusing one operation with another.

My proposed solution would allow typescript to infer types of declared operations/attributes directly from their declarations, allowing for safer, less error-prone code.

pdubroy commented 1 year ago

Oh, I see! Yes, that's a very good point. I was missing the distinction between the return values of the semantic actions themselves (which ultimately get used as the result of the operation) and the result of the operation.

TheOnlyTails commented 1 year ago

I've started working on this issue myself, diving head-first into the codebase, and so far these are my plans: The most important goal is to avoid breaking the old way of defining semantic ops/attrs, so this will mostly make changes to generated types.

Here are some prototypes I've came up with:

// a utility types - gets rid of the [index: string]: any on the original Node type
type NoIndexNode = {
  [K in keyof Node as string extends K ? never : K]: Node[K];
};

// a custom node type that adds all of the attributes and operations to the original Node, given action dicts for each.
export type TestNode<Ops, Attrs> = NoIndexNode & {
  [op in keyof Ops]: Ops[op] extends (
    ...args: infer Args
  ) => TestActionDict<infer Ret>
    ? (...args: Args) => Ret
    : never;
} & {
  [attr in keyof Attrs]: Attrs[attr] extends TestActionDict<infer AttrType>
    ? AttrType
    : never;
};

export interface TestGrammar<
  Ops = { [index: string]: (...any) => AuraActionDict<any> },
  Attrs = { [index: string]: AuraActionDict<any> }
> extends Grammar<Ops, Attrs> {
  createSemantics(operations?: Ops, attributes?: Attrs): TestSemantics;
  extendSemantics(
    superSemantics: TestSemantics,
    operations?: Ops,
    attributes?: Attrs
  ): TestSemantics;
}

The main concept here is passing around the ops and attrs objects from createSemantics, then constructing a Node type from it and re-using it in the action dictionary.

To preserve back-compat, we could check if both the attrs and ops objects passed to createSemantics are undefined, and then use the regular Node to allow for the old system to still be used.

TheOnlyTails commented 1 year ago

I've been working on and off on this for a while, and unfortunately, I think there's only 3 options for this:

  1. Keeping things the way they are now, which is not ideal;
  2. We could provide back-compat for the old system, but that would make the DX much worse for users of both the new and old systems;
  3. Completely remove the old system, and only allow using the new one, which would be a major breaking change.

IMHO, if this is still something that's worth doing, only option no. 3 is worth it. Trying to keep compatibility with the current way of doing things would be too much work to keep maintaining a flawed system that essentially opts-out of typechecking.

pdubroy commented 1 year ago

@TheOnlyTails Sorry for the late reply — thanks very much for looking into this! I'd be happy to try to fix this for the next major release. Before committing to anything, I need to find time to think about this more deeply, and would like to get some other eyes on it too.

mrshll commented 11 months ago

I'm interested in this as well. Defining the types on the operation's parameters seems necessary in order to use them, unless there's some other workaround to shim the type into this.args

rrthomas commented 11 months ago

I've done some work on this without looking at either this issue (oops!) or the Ohm source code, but just patching its auto-generated types in my project. I think it might be useful in any case as an example of how far one can get without dramatic surgery to Ohm itself, and some modest changes to how it's used in TypeScript.

In outline, I made the following changes:

  1. Node, IterationNode and NonterminalNode are made generic on a type Operations, which is the type of the operations offered by the semantics.
  2. I add a type ThisNode<Operations, Args>, which is used specifically for the this argument to semantic actions, capturing the fact that only this arguments have the args member. The Args type parameter is of course the type of the arguments object. I then changed the type of each action's this argument to ThisNode.
  3. I added type parameters to the generated FooSemantics type/interface, one for each new Node type, and one for the Operations type. If Ohm's type declarations were changed, only Operations would be needed.
  4. I added similar type parameters to the functions defined in the exported FooGrammar interface.

Here's an example of the resulting type declaration file, with most of the actions elided):

// AUTOGENERATED FILE
// This file was generated from ursa.ohm by `ohm generateBundles`.

import {
  BaseActionDict,
  Grammar,
  Node as NodeBase,
  NonterminalNode,
  NonterminalNode as NonterminalNodeBase,
  Semantics,
  TerminalNode
} from 'ohm-js';

interface NodeI<Operations> extends NodeBase {
  child(idx: number): Node<Operations>;
  children: Node<Operations>[];
  asIteration(): IterationNode<Operations>;
}

export type Node<Operations> = NodeI<Operations> & Operations;

export type IterationNode<Operations> = Node<Operations>;

export type NonterminalNode<Operations> = Node<Operations>;

export type ThisNode<Args, Operations> = Node<Operations> & {
  // Only the `this` of semantics action routines has this member.
  args: Args;
};

export interface UrsaActionDict<T, Node, NonterminalNode, IterationNode, ThisNode> extends BaseActionDict<T> {
  _terminal?: (this: ThisNode) => T;
  _nonterminal?: (this: ThisNode, ...children: NonterminalNode[]) => T;
  _iter?: (this: ThisNode, ...children: NonterminalNode[]) => T;
  Sequence?: (this: ThisNode, arg0: NonterminalNode, arg1: NonterminalNode) => T;
  …
}

interface UrsaSemanticsI<Node, NonterminalNode, IterationNode, ThisNode, Operations> extends Semantics {
  (match: MatchResult): Operations;
  addOperation<T>(name: string, actionDict: UrsaActionDict<T, Node, NonterminalNode, IterationNode, ThisNode>): this;
  extendOperation<T>(name: string, actionDict: UrsaActionDict<T, Node, NonterminalNode, IterationNode, ThisNode>): this;
  addAttribute<T>(name: string, actionDict: UrsaActionDict<T, Node, NonterminalNode, IterationNode, ThisNode>): this;
  extendAttribute<T>(name: string, actionDict: UrsaActionDict<T, Node, NonterminalNode, IterationNode, ThisNode>): this;
}
export type UrsaSemantics<Node, NonterminalNode, IterationNode, ThisNode, Operations> = UrsaSemanticsI<Node, NonterminalNode, IterationNode, ThisNode, Operations> & Operations;

export interface UrsaGrammar extends Grammar {
  createSemantics<Node, NonterminalNode, IterationNode, ThisNode, Operations>(): UrsaSemantics<Node, NonterminalNode, IterationNode, ThisNode, Operations>;
  extendSemantics<Node, NonterminalNode, IterationNode, ThisNode, Operations>(superSemantics: UrsaSemantics<Node, NonterminalNode, IterationNode, ThisNode, Operations>): UrsaSemantics<Node, NonterminalNode, IterationNode, ThisNode, Operations>;
}

declare const grammar: UrsaGrammar;
export default grammar;

With these types, I was able to remove all the type assertions in my code. A typical usage looks like this:

import grammar, {
  Node, NonterminalNode, IterationNode, ThisNode,
} from '../grammar/ursa.ohm-bundle.js'

type FormatterOperations = {
  fmt(a: FormatterArgs): Span
  hfmt(a: FormatterArgs): Span
}

type FormatterArgs = {
  maxWidth: number
  indentString: string
  simpleExpDepth: number
}

type FormatterNode = Node<FormatterOperations>
type FormatterNonterminalNode = NonterminalNode<FormatterOperations>
type FormatterIterationNode = IterationNode<FormatterOperations>
type FormatterThisNode = ThisNode<{a: FormatterArgs}, FormatterOperations>

export const semantics = grammar.createSemantics<FormatterNode, FormatterNonterminalNode, FormatterIterationNode, FormatterThisNode, FormatterOperations>()

One obviously debatable choice I've made here is to group my semantic operations in families which share the same arguments. That seems implicit in the way that Ohm works, and makes sense for me: in my project, I have a group of compiler actions and a group of formatter actions.

The examples above are taken from https://github.com/ursalang/ursa/tree/main/src/

Thanks very much for Ohm, it's amazing! It has been fun and quick to use. Nevertheless, I look forward to a new version with more thorough typing: with the above changes I was able to remove hundreds of type assertions, non-null assertions, and comments to myself where types were too lax. The Ursa compiler is now about ⅓ shorter, and much easier to read, so even with this rather heavy-handed low-tech intervention of patch Ohm's output, it feels worth it.

rrthomas commented 11 months ago

P.S., a corollary to my willingness to hack around with Ohm's output is that I'd be very happy with a breaking change to the API. Ohm is nice and stable as-is, so there's no pressure on me to upgrade at any particular moment, while a new API would be a great improvement on my current hack, and I wouldn't anticipate moving to it involving much more than rewriting "impedance matching" code of the sort I've exhibited above.