pegjs / pegjs

PEG.js: Parser generator for JavaScript
https://pegjs.org/
MIT License
4.83k stars 419 forks source link

Forum or chat? Passing values down / popping values up? #495

Closed mikeaustin closed 2 years ago

mikeaustin commented 7 years ago

pegjs is simple and powerful. I love it! I have a few questions that aren't covered in the docs:

I'm parsing this, which works, but the rule seems awkward:

f = x => x * x

If I delegate it to say Operator, there's not way to pass values down. If I create ValuesOperators and or = and =>, it's a bit cleaner, but at the top level I still need to check which one it was. In the JavaScript example, I see use of { type: "Literal", value: ... }. Is that the best way to pass the information back up?

Expression
  = values:Values body:(_ "=>" _ Expression)? assign:(_ "=" _ Expression)? {
      if (body && body.length > 0) {
        return new Function(values, body[3]);
      } else if (assign) {
        return new Assignment(values, assign[3]);
      } else {
        return values;
      }
    }

Thanks! Mike

langpavel commented 7 years ago

In grammars it is essential to understand rules. And you are providing only sub.. this is not clear to me, how it can be clear to machine? LALR grammars is very hard to write but easier than LR and easier then LL.. PEG grammar is easy to write bud most hard to write efficiently. Are you sure you are using right toy? If you have nice PEG grammar you should be able to create transpiler or interpret as easily as interpreting your rules.

Do you know boxing/unboxing in virtual machines?

How you can have values on left side of assignment? because you have this: body:(_ "=>" _ Expression)? ended with ?

what are values means in your grammar? Do you know what is L-value?

You must keep your grammar semantically close. It is hard with PEG in some cases.. { type: "Literal", value: ... } is exactly what you need in AST. Look at http://astexplorer.net/ what it can looks like

mikeaustin commented 7 years ago

I've just begun using it for experimenting, so yes, some of the naming is incorrect :) I've written a few simple LL(1) parsers, but it's been a while. I'm just curious what the best way is to handle decisions. I'll simplify for this syntax:

a = b or x => x * x

Is it more appropriate to do something like this?

AssignmentOrFunction = ident:Identifier operator:IdentOperator { if (operator.type === "Function") { ... }

IdentOperator = "=" !">" { return { type: "Assignment"; } } / "=>" { ... }

FYI, "values" was a term I used to mean a list of items -- they would be values if used as a tuple, but variables if used to define the function arguments. For example (2 + 3).

Thanks for your time!

langpavel commented 7 years ago

@mikeaustin It is on yours :-)

Writting grammars is all about reductions. Top most is statement. It is like sentense, you know, you have LL grammars :-)

It is your responsibility to define grammar in sense (sorry I have trouble with English :-D sometimes)

How you want process AssignmentOrFunction? This is basically same as statement in many languages, but can be R-value too. On the other side

IdentOperator
    = "=" !">" { return { type: "Assignment"; } }
    / "=>" { ... }

can be better written as

IdentOperator
    = "=>" { ... }
    / "=" { return { type: "Assignment"; } }

May be I'm wrong.. but this looks like you break your sentence in not so important fragment. Without context it is not really important what you evaluate this..

PEGs are really difficult to write because you can easily write unreachable rules or you can write rules which will theft your sentence and evaluate them incorrectly, so if you are beginner, you should have sentences on left and required ASTs on right all the time, ten times more automated tests then grammar rules. Trust me, you change one rule and your entire parser will break..

And of course you should play with, write something, again and again.. It is not trivial to write effective expression calculator with PEG, note several levels of precedence.. PEG is better for general parsing.

You don't give me more context in your question so it is very hard to guess what you really want to build but I can guess you should play more with grammars, there is AntLR, great tool for learning, GoldParser (LALR) (my first toy :-) ), some old school parsers that are well tested, most efficient but splitted to tokenizer/parser (older lex and yacc, newer flex and bison), many ports of them..

PEG looks like most simple. It really is. But it will always be most dangerous...

mikeaustin commented 7 years ago

Thank you much! Oh, I did find the Google group for pegjs, but didn't notice it because it was in the development section. If I have other questions, I'll definitely go there instead.

Yes, assignment "x = y" is a statement, but "x => x * x" can be part of an expression, as well as "x == y", etc. "=" is a special case, and so is "=>" in the sense that they are not regular operators. I will practice more, and look more closely at the examples. I do wish there were examples that were larger than the calculator, but smaller than the other examples. Maybe if I make progress it can be an example :)

For my project, I'd like to write a simple language to transpile to JS. Basics are external, lexically scoped methods (no monkey patching, like Dylan or CLOS but without multimethods), keyword arguments, ADTs, maybe even optional static typing. "Simple" just means everything works together, and there is just one way to do something. Making something simple can be hard.

Also, there are libraries like adt and multimethods that could benefit from native syntax. Those actually could be achieved with sweet, but sweet can only extend JS, not make it simpler. TypeScript has options like "--strictNullChecks" and supports inline union types, but again, it extends JS so it leaves all the bad things still there (implicit coercions, "+" for string concatination, scope hoisting, etc.).

Again, thanks for all the help! Mike

mikeaustin commented 7 years ago

I was under the assumption that no backtracking and being greedy meant you have to work with "wherever you currently are", but that's not the case. Yay!

Greedy just means it if it finds a match it will take it, correct? But if the entire rule doesn't match, it will start again from the beginning in the next match. This makes it much easier than how I was first trying to do it.

x, y = 2 + 3.foo + "foo", 5 * 2

This is fairly easy when assignment can be matched separately from a regular expression, etc.

Statement
  = Assignment
  / Expression

Assignment
  = vars:Variables _ "=" _ expr:Expression

Variables
  = head:Identifier tail:(_ "," _ Identifier)*

...
dmsnell commented 7 years ago

@mikeaustin PEGs have ordered choice meaning that the first pattern to match in an expression will automatically match. @langpavel's example is a good way of expressing a preference for => over =. You might want to think more about what an expression can be in terms of "one of these things" instead of "grab all this text which could be an expression and figure it out later"

Expression
  = FunctionExpression
  / Assignment
  / Value

To demonstrate I created a test grammar which you can play with. It's a bit noisy but I've tried to keep it pretty simple, ignoring some things like allowing extra whitespace and operators. Note the Expression block where the magic happens. Also, I've used things like SubExpression and ExpressionList as ways of handling the "many of these" situations; that pattern works well for me keeping the list separators and items separate.

In each case I'm trying to keep the grammar simple by having one item for each kind of match. There's some redundancy involved because I could rewrite it such that we have one big match sequence with all the possible combinations but that's hard to follow and reason about and leaves open ambiguity which PEGs can eliminate with ordered choice. Hope it helps.

mikeaustin commented 7 years ago

I'll take a look at the test grammar, thanks! It helps to see a minimal subset of a language since the JavaScript grammar is pretty big.

@dmsnell I learned a lot lately by removing rules and adding a few new ones to the JavaScript grammar, you can see the latest here: impulse.pegjs. I've removed elision syntax [,,], some operators, sequence expressions, etc. and added tuple syntax (1, 2) and range syntax 1..5. Floating point numbers require a decimal on both sides to make literal ranges easier to parse.

I've been working on a lot of the runtime lately, but want to get back to the parser to actually emit JavaScript. I'm between jobs so I have a little time on my hands :) Some ideas for a cleaned up dialect of JavaScript with extension methods, named parameters, traits and mixins, operator overloading and literal tuple and range syntax.

Thanks again! Mike