rdking / chevrotain-ebnf

A Chevrotain Lexer/Parser to convert EBNF into a Chevrotain Lexer/Parser
9 stars 0 forks source link

Feedback #2

Closed bd82 closed 5 years ago

bd82 commented 5 years ago

These are all suggestions :)

bd82 commented 5 years ago

let lexer = new Lexer(Object.values(parser.tokensMap));

The order of tokens is meaningful in some cases.

So:

let lexer = new Lexer(Object.values(parser.tokensMap));

May be a bit naive. 😢

rdking commented 5 years ago

let lexer = new Lexer(Object.values(parser.tokensMap));

The order of tokens is meaningful in some cases. ... May be a bit naive. 😢

It's painfully naive. The next version will probably ask for an ordered token map. That way I can give the developer a chance to both order and name the tokens. The current naming pattern is also fairly unfortunate. 😓

rdking commented 5 years ago

Evaluate exposing a source generation API as well to avoid eval, and allow easier debugging.

I wrote that feature in already. I just haven't enabled it yet because I was busy testing to get the core logic working. I wanted to wait until I solved the token order issue before I released that feature.

bd82 commented 5 years ago

I wrote that feature in already. I just haven't enabled it yet because I was busy testing to get the core logic working. I wanted to wait until I solved the token order issue before I released that feature.

No rush. 👍

You may be able to perform some analysis on the regular expressions I do the same in Chevrotain for optimization purposes using this library

I think there may be other libraries that convert a regexp to an AST for analysis too... Also it is naturally generally safe to put longer tokens before shorter ones. If you can solve the ordering problem, we could use that logic to perform validations on Chevrotain lexers (hand written).

I think this would be related to equivalence of two finite state automaton.

rdking commented 5 years ago

I noticed something I thought a bit peculiar. With Chevrotain no token string is ever allowed to match against more than 1 token, even if the grammar makes the use cases unambiguous. Is there any particular reason for that?

bd82 commented 5 years ago

I noticed something I thought a bit peculiar. With Chevrotain no token string is ever allowed to match against more than 1 token, even if the grammar makes the use cases unambiguous. Is there any particular reason for that?

That is because Chevrotain uses a separate and distinct lexing phase so it cannot use grammar context to decide between lexing options.

It should be possible to do lexing on demand particularly if/when a streaming lexer would be implemented

But that does not exist as a feature currently...

rdking commented 5 years ago

Ok. I just released v1.1.1. This includes the ability to specify names and ordering for tokens as well as the ability to generate source code. I've up dated the documentation, also. If you see any room for improvement, feel free to submit a patch.

bd82 commented 5 years ago

No extra feedback at this time.

I've linked your project from the custom API example docs.. https://github.com/SAP/chevrotain/blob/master/docs/guide/custom_apis.md#runnable-examples