sirthias / parboiled2

A macro-based PEG parser generator for Scala 2.10+
Other
717 stars 86 forks source link

Could we get shorter versions of the hard-coded combinators? #111

Closed lihaoyi closed 9 years ago

lihaoyi commented 9 years ago

The amount of horizontal space wasted purely because of these long names is enormous. I can't even abstract over them with normal functions/extension-methods because they're all macro-magic. Even trying to rename them on import doesn't work

val x = this
  import this.x.{zeroOrMore => z}
Illegal rule call: this.x.zeroOrMore[shapeless.HNil, shapeless.HNil](ScalaSyntax.this.Basic.WhitespaceChar.|[shapeless.HNil, shapeless.HNil](ScalaSyntax.this.Literals.Comment))(support.this.Lifter.forRule0[scala.collection.immutable.Seq])
[error]   def WS = rule( z(Basic.WhitespaceChar | Literals.Comment) )
[error]                   ^

It is silly that I spend so much time trying to refactor parsers to keep them on one line, and two or three of these silly combinators are enough to run off the right side of my editor.

lihaoyi commented 9 years ago

In the interim I'm going to write a macro to do this for me; I bet that'll work

lihaoyi commented 9 years ago

Here we go. A ridiculous amount of type-macro-boilerplate, but this 77 lines of boilerplate shaved much more than 77 lines of code off my parser

package macros
import language.experimental.macros
import org.parboiled2.{Rule0, RuleDSLCombinators, Rule}
import org.parboiled2.support._
import shapeless.HList

import scala.collection.immutable
import scala.reflect.macros.whitebox

object Macros{
  def opt[I <: HList, O <: HList]
         (r: Rule[I, O])
         (implicit o: Lifter[Option, I, O])
         : Rule[o.In, o.Out] = macro optional[I, O]

  def optional[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
              (c: whitebox.Context)
              (r: c.Expr[Rule[I, O]])
              (o: c.Expr[Lifter[Option, I, O]])
              : c.Expr[Rule[o.value.In, o.value.Out]] = {
    import c.universe._
    c.Expr(q"optional($r)")
  }
  def rep[I <: HList, O <: HList]
         (r: Rule[I, O])
         (implicit s: Lifter[immutable.Seq, I, O])
         : Rule[s.In, s.Out] = macro rep0[I, O]

  def rep0[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
          (c: whitebox.Context)
          (r: c.Expr[Rule[I, O]])
          (s: c.Expr[Lifter[immutable.Seq, I, O]])
          : c.Expr[Rule[s.value.In, s.value.Out]] = {
    import c.universe._
    c.Expr(q"zeroOrMore($r)")
  }
  def repSep[I <: HList, O <: HList]
            (r: Rule[I, O], sep: Rule0)
            (implicit s: Lifter[immutable.Seq, I, O])
            : Rule[s.In, s.Out] = macro repSep0[I, O]

  def repSep0[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
              (c: whitebox.Context)
              (r: c.Expr[Rule[I, O]], sep: c.Expr[Rule0])
              (s: c.Expr[Lifter[immutable.Seq, I, O]])
              : c.Expr[Rule[s.value.In, s.value.Out]] = {
    import c.universe._
    c.Expr(q"zeroOrMore($r).separatedBy($sep)")
  }
  def rep1[I <: HList, O <: HList]
          (r: Rule[I, O])
          (implicit s: Lifter[immutable.Seq, I, O])
          : Rule[s.In, s.Out] = macro rep10[I, O]

  def rep10[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
           (c: whitebox.Context)
           (r: c.Expr[Rule[I, O]])
           (s: c.Expr[Lifter[immutable.Seq, I, O]])
           : c.Expr[Rule[s.value.In, s.value.Out]] = {
    import c.universe._
    c.Expr(q"oneOrMore($r)")
  }
  def rep1Sep[I <: HList, O <: HList]
            (r: Rule[I, O], sep: Rule0)
            (implicit s: Lifter[immutable.Seq, I, O])
            : Rule[s.In, s.Out] = macro rep1Sep0[I, O]

  def rep1Sep0[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
              (c: whitebox.Context)
              (r: c.Expr[Rule[I, O]], sep: c.Expr[Rule0])
              (s: c.Expr[Lifter[immutable.Seq, I, O]])
              : c.Expr[Rule[s.value.In, s.value.Out]] = {
    import c.universe._
    c.Expr(q"oneOrMore($r).separatedBy($sep)")
  }
}
sirthias commented 9 years ago

Yes, simple macro forwarders are all that's required. I like your idea of adding shortcut aliases, although I'd be inclined to go the the .+, .* and .? variants rather than the combinator style names.

lihaoyi commented 9 years ago

I don't care either way. The main reason I went with this was because I could get the normal def-macros working and could not get the funky extension-method-macros working in a way that'd make parboiled happy. I prefer the post-fix operators too.

paulp commented 9 years ago

Yeah, this has been driving me bonkers as well. Glad I stopped short of writing that same macro.

paulp commented 9 years ago

It does leave me with a bad feeling about the compositional aspects of this approach, especially because one can write methods to manipulate macro methods which compile but just don't work. But in this case I think I have to eat that to dine on the upside.

paulp commented 9 years ago

It is silly that I spend so much time trying to refactor parsers to keep them on one line

@lihaoyi I'm glad you share this ambition!

sirthias commented 9 years ago

@paulp Yes, there sure are downsides to this macro-based approach. The main problem is that, because you write the parser in Scala, you might easily be fooled into thinking that you have the full power of the language available for the parser definition. However, one has to remember that pb2 is a parser-generator, which clearly separates rule construction and rule execution into two different phases. As such there are limits to what you can do. Walking that line isn't as easy as it should be yet (mainly because the line is still invisible (as underdocumented)).

paulp commented 9 years ago

There's plenty we can pin on fundamental aspects of what we do when, but when one can't rename a method and have it still work, we're way outside the fundamental, mired in the tar pit of scala implementation failings.

lihaoyi commented 9 years ago

FWIW, by-name meta-parsers doesn't work in the general case. I kept getting weird errors when trying to pass anything not-a-single-identifier (e.g. zeroOrMore(Parser ~ B)) into a by-name meta-parser.

It would be great if we could have it work. Allocating a few objects here and there is no big deal, and we can optimize the bits that matter after profiling. Hand-C&P-ing large sections of code in the name of "performance" isn't the right thing to do.

There's plenty we can pin on fundamental aspects of what we do when, but when one can't rename a method and have it still work

Breaking alpha-equivalence/referential-transparence and friends is what Term-macros are all about I'm afraid, whether we like it or not. It is also what they are useful for so I can't complain too much. I make use of breaking alpha-equivalence extensively for various things as a substitute for reflection in Scala.js where reflection does not exist.

It would be cool if we could instead predicate the macros to work on terms of a particular Type: tag the return-type of zeroOrMore and friends with a marker-Type, and have the macro-transforms look for that. Unlike Terms, Types can always be propagated without boilerplate

lihaoyi commented 9 years ago

@sirthias They don't work; I was afraid this would happen

[error] /Users/haoyi/Dropbox (Personal)/Workspace/scala-parser/src/main/scala/scalaParser/syntax/Identifiers.scala:18: type mismatch;
[error]  found   : String("_")
[error]  required: ?{def *: ?}
[error] Note that implicit conversions are not applicable because they are ambiguous:
[error]  both method augmentString in object Predef of type (x: String)scala.collection.immutable.StringOps
[error]  and method str in trait RuleDSLBasics of type (s: String)org.parboiled2.Rule0
[error]  are possible conversion functions from String("_") to ?{def *: ?}
[error]       ("_".* ~ (!SkipChar(dollar) ~ Letter | Digit).+).* ~ ("_".+ ~ OpChar.*).?
[error]        ^
[error] /Users/haoyi/Dropbox (Personal)/Workspace/scala-parser/src/main/scala/scalaParser/syntax/Identifiers.scala:18: missing arguments for method + in class String;
[error] follow this method with `_' if you want to treat it as a partially applied function
[error]       ("_".* ~ (!SkipChar(dollar) ~ Letter | Digit).+).* ~ ("_".+ ~ OpChar.*).?
[error]                                                                 ^
[error] /Users/haoyi/Dropbox (Personal)/Workspace/scala-parser/src/main/scala/scalaParser/syntax/Literals.scala:47: type mismatch;
[error]  found   : (x: Double)Double <and> (x: Float)Float <and> (x: Long)Long <and> (x: Int)Int <and> (x: Char)Int <and> (x: Short)Int <and> (x: Byte)Int <and> (x: String)String
[error]  required: org.parboiled2.Rule[?,?]
[error]       def TripleTail = rule( TQ ~ '"'.+ )
[error]                                       ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 3 s, completed Nov 30, 2014 6:09:58 PM

What do you think? Falling back to prefix-notation is an option, but that doesn't work for + because of he way how it is parsed. I'm reluctant to go around adding (...: R0) around all of my terminals...

lihaoyi commented 9 years ago

I guess I can just fall back to the annoying super-long names for these callsites =/ Turns out there's only three of them in my rather-large parser, so it's not terrible

paulp commented 9 years ago

They work for me, but I always have something like this.

import Predef.{ augmentString => _, wrapString => _, ArrowAssoc => _, _ }
paulp commented 9 years ago

Breaking alpha-equivalence/referential-transparence and friends is what Term-macros are all about I'm afraid, whether we like it or not.

This is just defeatism. There's nothing inherent about any of this which means I should have to go to the authors of parboiled asking them to write me a macro with the right name.

paulp commented 9 years ago

Summary of latest changes: while ?, +, * are sometimes nice, they are too often too noisy because of all the bonus dots and parens necessary. Throw in all the single and double quotes and backticks and tildes which the rules already swim with and it's too much.

There are five methods I really need for this purpose, any others would be gravy. These names are fairly well established I think. If I had to choose, I'd much rather have these than the postfix operators, but best yet would be both.

def rep
def rep1
def repsep
def repsep1
def opt

As you can surely deduce, those are

p.*
p.+
p.* separatedBy q
p.+ separatedBy q
p.?
paulp commented 9 years ago

Oh look, that's right there in the boilerplate. Well, I guess that supports the theory that those are the right names, at least from a case insensitive view. (I see no call for inner capitalization, it's not like "rep" and "sep" are words in the first place, so "repsep" is just as much a word.)

paulp commented 9 years ago

I see in scala's it's rep1sep, and also there's repN which seems like a useful generalization.

lihaoyi commented 9 years ago

I can get behind anything, as long as it's not

z e r o O r M o r e (thing) . s e p a r a t e d B y (',')
lihaoyi commented 9 years ago

Here's scala-parser using postfix operators if you want to see what it looks like "in the weeds"

https://github.com/lihaoyi/scala-parser/blob/postfix/src/main/scala/scalaParser/Scala.scala

paulp commented 9 years ago

Similarly, here's my fork using (mostly) postfix operators.

https://github.com/paulp/scala-parser/blob/parboiled-snapshot/src/main/scala/scalaParser/ScalaSyntax.scala

sirthias commented 9 years ago

Yeah, I can see 'foo..sep("bar")not being too great visually. How about we reduce it to a simplefoo.("bar")or even betterfoo * "bar"? I know you guys likerepsep` and friends because you come from a parser combinator background. But to someone coming from ANTLR and other generators, these are kind of weird names. All a matter of taste.

sirthias commented 9 years ago

@lihaoyi Another fix for the ambuiguity is using the str implicit conversion explicitly:

str("foo").+
paulp commented 9 years ago

It may be all a matter of taste, but so is everything. Naming matters a lot in practice, in a context such as this more than almost anywhere. I promise there is nothing on this earth I'd less like to be doing than lobbying for a particular choice of method naming in a library. Already I can see that most likely I'll be forking this project too, just to bring the naming under my own control.

lihaoyi commented 9 years ago

@paulp the 77 lines of macros will provide you all the naming you want without needing to fork the several-kloc project, though if you really want to...

paulp commented 9 years ago

My experience trying to pretend that macros are modular has been nothing but horror show, and a fork creates other opportunities besides.

sirthias commented 9 years ago

Just published 2.1.0-SNAPSHOT, which, apart from the addition of #112, also gives you the shortcuts

foo.+(bar)
foo.*(bar)

for

oneOrMore(foo).separatedBy(bar)
zeroOrMore(foo).separatedBy(bar)

Note that while the infix notation foo * bar works but you need put the whole thing into parens when used in sequences because of operator precedence:

foo * bar ~ baz

equals

foo * (bar ~ baz)

rather than

(foo * bar) ~ baz
FranklinChen commented 9 years ago

Thanks for providing the operators! Saves a lot of typing.

xeno-by commented 9 years ago

@sirthias About import this.x.{zeroOrMore => z} not working. Could it be that you're checking something against a name zeroOrMore instead of a symbol corresponding to that method? I think if you implement a symbolic check, then the problems with aliases will magically disappear!

sirthias commented 9 years ago

@xeno-by Yes, it could well be that we are not matching properly at this point. This is the central match: https://github.com/sirthias/parboiled2/blob/master/parboiled-core/src/main/scala/org/parboiled2/support/OpTreeContext.scala#L61

How can we improve it?

xeno-by commented 9 years ago

@sirthias Instead of, say, case q"$a.this.zeroOrMore[$b, $c]($arg)($l)", do case q"$a.this.$_[$b, $c]($arg)($l)" if tree.symbol == <symbol of zeroOrMore>. Do you know how to obtain the symbol for zeroOrMore?

sirthias commented 9 years ago

Do you know how to obtain the symbol for zeroOrMore?

I'd think something like

typeOf[RuleDSLCombinators].member(newTermName("zeroOrMore"))

right?

However, it seems like the code is not really becoming more readable with moving to a symbol-based match, which is unfortunate but probably necessary.

Another question: Does the underscore in q"$a.this.$_[$b, $c]($arg)($l)" now work as a wildcard like that? It think it needed to be ${_} rather than $_ at some point in the past, which is why we don't use it yet.

xeno-by commented 9 years ago

Yes, you're right on both points.

Speaking of ${_}. This is necessary for 2.10, but in 2.11 you can say just $_.

sirthias commented 9 years ago

Speaking of ${}. This is necessary for 2.10, but in 2.11 you can say just $.

Ah, cool. Didn't know that. Thanks, Eugene!