kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

Typescript types as strings after `@preprocessor typescript` directive #558

Closed bandaloo closed 3 years ago

bandaloo commented 3 years ago

resolves #527

Hi, this was a little experiment as I mostly wanted to see if I could figure out how to make a change to the grammar files and bootstrap the compiler and change the generator. Lots of constructive criticism please, and also please check I didn't do something dumb here! I added a new test and all the old ones pass. I also tested this by linking it to my own project and it simplified the grammar file and the generated output by quite a bit! This also opens us up to being able to pass in extra data to directives, in the style of how some C pragma work.

Pasted in from the docs that I changed:

Using preprocessors

By default, nearleyc compiles your grammar to JavaScript. However, you can instead choose to compile to CoffeeScript or TypeScript by adding @preprocessor coffee or @preprocessor typescript at the top of your grammar file. This can be useful to write your postprocessors in a different language, and to get type annotations if you wish to use nearley in a statically typed dialect of JavaScript.

For TypeScript only, you can specify the type of the lexer with the following syntax:

@preprocessor typescript lexer "Lexer"

@{%
import { compile } from 'moo'
import type { Lexer } from 'moo'

const lexer = compile({
    // lexer rules ...
});
%}

@lexer lexer

It is possible to specify the token type in the same way (e.g. @preprocessor typescript token "Token"). This is particularly useful when using moo with TypeScript, as it allows you to generate TypeScript code that uses moo's built-in types.

kach commented 3 years ago

What's the syntax for specifying both types?

bandaloo commented 3 years ago

So you can do it: @preprocessor typescript lexer "Lexer" token "Token" but it's actually redundant because if you specify the lexer, you actually don't have to import the token type. I originally thought you might have to in order to get it all to typecheck, but i was mistaken. If you look in generator.js you can see the #ifndef style metaprogamming that goes on; a NearleyToken member is only in NearleyLexer interface.

kach commented 3 years ago

Why would you ever need to specify a custom lexer? Wouldn't only specifying the custom token type be enough? (Of course you'd substitute in that custom token type name in place of NearleyToken in the formatError signature.)

kach commented 3 years ago

e.g. this (smaller) change seemed to work for someone else https://github.com/kach/nearley/issues/527#issuecomment-671650091

bandaloo commented 3 years ago

e.g. this (smaller) change seemed to work for someone else #527 (comment)

That solution doesn't fully work in my case. My problem was that @lexer lexer that gets placed into the generated code was not of type NearleyLexer, but of moo's lexer type, so it wouldn't compile without an "unsafe" cast or going in and editing the generated code. A "custom lexer" was more convenient than a custom token because moo's lexer has formatError and everything already so instead of creating a new interface NearleyLexer to mirror moo's Lexer, you can just replace all NearleyLexer with Lexer and the code gets a lot simpler and makes the type checker happy:

Screen Shot 2020-11-28 at 12 51 22 AM

If anything, maybe this new syntax should only allow for a custom lexer type and not a token type.

I was basically hacking away at nearley this afternoon in service of my own typescript-related OCD so if you think this is overkill for a PR then I totally understand. But maybe it's a general solution that will work for other people?

kach commented 3 years ago

I see what you're saying. I should clarify that theoretically the point of the NearleyLexer interface is to enforce that whatever lexer you pass in is compatible with nearley (in particular, imagine passing in a non-moo lexer). For that reason having an "interface" makes the most sense to me.

From what I can tell, the problem is specifically the formatError part of the interface, which is typed to be a function that takes a NearleyToken as input. moo's formatError takes a Token, which is a subtype of NearleyToken and that's a problem for co-/contra-variance reasons (the interface promises that formatError should work for any NearleyToken but moo only promises that it works for moo tokens — a reasonable complaint, then!). With this in mind, to me the "minimally invasive" solution seems to be to replace NearleyToken with the more specific token type. I'm still not sure why that wouldn't solve this problem.

Actually, here's an even shorter solution: instead of typing formatError as NearleyToken => string make it never => string (and there's no need to pass in lexer or token type). Could you real quick check to see if this fixes things?

I was basically hacking away at nearley this afternoon in service of my own typescript-related OCD so if you think this is overkill for a PR then I totally understand. But maybe it's a general solution that will work for other people?

I agree that this is a problem and am definitely invested in getting a PR merged to fix it. Thank you for working on this! :)

bandaloo commented 3 years ago

@kach Yep, I see the only incompatible type is formatError, and good thinking with never because that does the trick; I tested it. And you're right; just passing in a custom token also will make NearleyLexer and moo.Lexer fully compatible. But, I guess passing in a custom token doesn't serve a particular advantage over the never solution, especially since it's generated code you don't really need to look at or edit.

bandaloo commented 3 years ago

see my last comment:

Yep, I see the only incompatible type is formatError, and good thinking with never because that does the trick; I tested it. And you're right; just passing in a custom token also will make NearleyLexer and moo.Lexer fully compatible. But, I guess passing in a custom token doesn't serve a particular advantage over the never solution, especially since it's generated code you don't really need to look at or edit.

kach commented 3 years ago

Great! Would you like to submit a (fresh) PR that changes the generator to emit the never type in the correct place? :)

kach commented 3 years ago

Oh, I see you've already done so in #560! Yay!