Closed lihaoyi closed 9 years ago
In the interim I'm going to write a macro to do this for me; I bet that'll work
Here we go. A ridiculous amount of type-macro-boilerplate, but this 77 lines of boilerplate shaved much more than 77 lines of code off my parser
package macros
import language.experimental.macros
import org.parboiled2.{Rule0, RuleDSLCombinators, Rule}
import org.parboiled2.support._
import shapeless.HList
import scala.collection.immutable
import scala.reflect.macros.whitebox
object Macros{
def opt[I <: HList, O <: HList]
(r: Rule[I, O])
(implicit o: Lifter[Option, I, O])
: Rule[o.In, o.Out] = macro optional[I, O]
def optional[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
(c: whitebox.Context)
(r: c.Expr[Rule[I, O]])
(o: c.Expr[Lifter[Option, I, O]])
: c.Expr[Rule[o.value.In, o.value.Out]] = {
import c.universe._
c.Expr(q"optional($r)")
}
def rep[I <: HList, O <: HList]
(r: Rule[I, O])
(implicit s: Lifter[immutable.Seq, I, O])
: Rule[s.In, s.Out] = macro rep0[I, O]
def rep0[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
(c: whitebox.Context)
(r: c.Expr[Rule[I, O]])
(s: c.Expr[Lifter[immutable.Seq, I, O]])
: c.Expr[Rule[s.value.In, s.value.Out]] = {
import c.universe._
c.Expr(q"zeroOrMore($r)")
}
def repSep[I <: HList, O <: HList]
(r: Rule[I, O], sep: Rule0)
(implicit s: Lifter[immutable.Seq, I, O])
: Rule[s.In, s.Out] = macro repSep0[I, O]
def repSep0[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
(c: whitebox.Context)
(r: c.Expr[Rule[I, O]], sep: c.Expr[Rule0])
(s: c.Expr[Lifter[immutable.Seq, I, O]])
: c.Expr[Rule[s.value.In, s.value.Out]] = {
import c.universe._
c.Expr(q"zeroOrMore($r).separatedBy($sep)")
}
def rep1[I <: HList, O <: HList]
(r: Rule[I, O])
(implicit s: Lifter[immutable.Seq, I, O])
: Rule[s.In, s.Out] = macro rep10[I, O]
def rep10[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
(c: whitebox.Context)
(r: c.Expr[Rule[I, O]])
(s: c.Expr[Lifter[immutable.Seq, I, O]])
: c.Expr[Rule[s.value.In, s.value.Out]] = {
import c.universe._
c.Expr(q"oneOrMore($r)")
}
def rep1Sep[I <: HList, O <: HList]
(r: Rule[I, O], sep: Rule0)
(implicit s: Lifter[immutable.Seq, I, O])
: Rule[s.In, s.Out] = macro rep1Sep0[I, O]
def rep1Sep0[I <: HList: c.WeakTypeTag, O <: HList: c.WeakTypeTag]
(c: whitebox.Context)
(r: c.Expr[Rule[I, O]], sep: c.Expr[Rule0])
(s: c.Expr[Lifter[immutable.Seq, I, O]])
: c.Expr[Rule[s.value.In, s.value.Out]] = {
import c.universe._
c.Expr(q"oneOrMore($r).separatedBy($sep)")
}
}
Yes, simple macro forwarders are all that's required.
I like your idea of adding shortcut aliases, although I'd be inclined to go the the .+
, .*
and .?
variants rather than the combinator style names.
I don't care either way. The main reason I went with this was because I could get the normal def-macros working and could not get the funky extension-method-macros working in a way that'd make parboiled happy. I prefer the post-fix operators too.
Yeah, this has been driving me bonkers as well. Glad I stopped short of writing that same macro.
It does leave me with a bad feeling about the compositional aspects of this approach, especially because one can write methods to manipulate macro methods which compile but just don't work. But in this case I think I have to eat that to dine on the upside.
It is silly that I spend so much time trying to refactor parsers to keep them on one line
@lihaoyi I'm glad you share this ambition!
@paulp Yes, there sure are downsides to this macro-based approach. The main problem is that, because you write the parser in Scala, you might easily be fooled into thinking that you have the full power of the language available for the parser definition. However, one has to remember that pb2 is a parser-generator, which clearly separates rule construction and rule execution into two different phases. As such there are limits to what you can do. Walking that line isn't as easy as it should be yet (mainly because the line is still invisible (as underdocumented)).
There's plenty we can pin on fundamental aspects of what we do when, but when one can't rename a method and have it still work, we're way outside the fundamental, mired in the tar pit of scala implementation failings.
FWIW, by-name meta-parsers doesn't work in the general case. I kept getting weird errors when trying to pass anything not-a-single-identifier (e.g. zeroOrMore(Parser ~ B)
) into a by-name meta-parser.
It would be great if we could have it work. Allocating a few objects here and there is no big deal, and we can optimize the bits that matter after profiling. Hand-C&P-ing large sections of code in the name of "performance" isn't the right thing to do.
There's plenty we can pin on fundamental aspects of what we do when, but when one can't rename a method and have it still work
Breaking alpha-equivalence/referential-transparence and friends is what Term-macros are all about I'm afraid, whether we like it or not. It is also what they are useful for so I can't complain too much. I make use of breaking alpha-equivalence extensively for various things as a substitute for reflection in Scala.js where reflection does not exist.
It would be cool if we could instead predicate the macros to work on terms of a particular Type: tag the return-type of zeroOrMore
and friends with a marker-Type, and have the macro-transforms look for that. Unlike Terms, Types can always be propagated without boilerplate
@sirthias They don't work; I was afraid this would happen
[error] /Users/haoyi/Dropbox (Personal)/Workspace/scala-parser/src/main/scala/scalaParser/syntax/Identifiers.scala:18: type mismatch;
[error] found : String("_")
[error] required: ?{def *: ?}
[error] Note that implicit conversions are not applicable because they are ambiguous:
[error] both method augmentString in object Predef of type (x: String)scala.collection.immutable.StringOps
[error] and method str in trait RuleDSLBasics of type (s: String)org.parboiled2.Rule0
[error] are possible conversion functions from String("_") to ?{def *: ?}
[error] ("_".* ~ (!SkipChar(dollar) ~ Letter | Digit).+).* ~ ("_".+ ~ OpChar.*).?
[error] ^
[error] /Users/haoyi/Dropbox (Personal)/Workspace/scala-parser/src/main/scala/scalaParser/syntax/Identifiers.scala:18: missing arguments for method + in class String;
[error] follow this method with `_' if you want to treat it as a partially applied function
[error] ("_".* ~ (!SkipChar(dollar) ~ Letter | Digit).+).* ~ ("_".+ ~ OpChar.*).?
[error] ^
[error] /Users/haoyi/Dropbox (Personal)/Workspace/scala-parser/src/main/scala/scalaParser/syntax/Literals.scala:47: type mismatch;
[error] found : (x: Double)Double <and> (x: Float)Float <and> (x: Long)Long <and> (x: Int)Int <and> (x: Char)Int <and> (x: Short)Int <and> (x: Byte)Int <and> (x: String)String
[error] required: org.parboiled2.Rule[?,?]
[error] def TripleTail = rule( TQ ~ '"'.+ )
[error] ^
[error] three errors found
[error] (compile:compile) Compilation failed
[error] Total time: 3 s, completed Nov 30, 2014 6:09:58 PM
What do you think? Falling back to prefix-notation is an option, but that doesn't work for +
because of he way how it is parsed. I'm reluctant to go around adding (...: R0)
around all of my terminals...
I guess I can just fall back to the annoying super-long names for these callsites =/ Turns out there's only three of them in my rather-large parser, so it's not terrible
They work for me, but I always have something like this.
import Predef.{ augmentString => _, wrapString => _, ArrowAssoc => _, _ }
Breaking alpha-equivalence/referential-transparence and friends is what Term-macros are all about I'm afraid, whether we like it or not.
This is just defeatism. There's nothing inherent about any of this which means I should have to go to the authors of parboiled asking them to write me a macro with the right name.
Summary of latest changes: while ?, +, * are sometimes nice, they are too often too noisy because of all the bonus dots and parens necessary. Throw in all the single and double quotes and backticks and tildes which the rules already swim with and it's too much.
There are five methods I really need for this purpose, any others would be gravy. These names are fairly well established I think. If I had to choose, I'd much rather have these than the postfix operators, but best yet would be both.
def rep
def rep1
def repsep
def repsep1
def opt
As you can surely deduce, those are
p.*
p.+
p.* separatedBy q
p.+ separatedBy q
p.?
Oh look, that's right there in the boilerplate. Well, I guess that supports the theory that those are the right names, at least from a case insensitive view. (I see no call for inner capitalization, it's not like "rep" and "sep" are words in the first place, so "repsep" is just as much a word.)
I see in scala's it's rep1sep, and also there's repN which seems like a useful generalization.
I can get behind anything, as long as it's not
z e r o O r M o r e (thing) . s e p a r a t e d B y (',')
Here's scala-parser using postfix operators if you want to see what it looks like "in the weeds"
https://github.com/lihaoyi/scala-parser/blob/postfix/src/main/scala/scalaParser/Scala.scala
Similarly, here's my fork using (mostly) postfix operators.
Yeah, I can see 'foo..sep("bar")not being too great visually. How about we reduce it to a simple
foo.("bar")or even better
foo * "bar"? I know you guys like
repsep` and friends because you come from a parser combinator background.
But to someone coming from ANTLR and other generators, these are kind of weird names.
All a matter of taste.
@lihaoyi Another fix for the ambuiguity is using the str
implicit conversion explicitly:
str("foo").+
It may be all a matter of taste, but so is everything. Naming matters a lot in practice, in a context such as this more than almost anywhere. I promise there is nothing on this earth I'd less like to be doing than lobbying for a particular choice of method naming in a library. Already I can see that most likely I'll be forking this project too, just to bring the naming under my own control.
@paulp the 77 lines of macros will provide you all the naming you want without needing to fork the several-kloc project, though if you really want to...
My experience trying to pretend that macros are modular has been nothing but horror show, and a fork creates other opportunities besides.
Just published 2.1.0-SNAPSHOT, which, apart from the addition of #112, also gives you the shortcuts
foo.+(bar)
foo.*(bar)
for
oneOrMore(foo).separatedBy(bar)
zeroOrMore(foo).separatedBy(bar)
Note that while the infix notation foo * bar
works but you need put the whole thing into parens when used in sequences because of operator precedence:
foo * bar ~ baz
equals
foo * (bar ~ baz)
rather than
(foo * bar) ~ baz
Thanks for providing the operators! Saves a lot of typing.
@sirthias About import this.x.{zeroOrMore => z}
not working. Could it be that you're checking something against a name zeroOrMore
instead of a symbol corresponding to that method? I think if you implement a symbolic check, then the problems with aliases will magically disappear!
@xeno-by Yes, it could well be that we are not matching properly at this point. This is the central match: https://github.com/sirthias/parboiled2/blob/master/parboiled-core/src/main/scala/org/parboiled2/support/OpTreeContext.scala#L61
How can we improve it?
@sirthias Instead of, say, case q"$a.this.zeroOrMore[$b, $c]($arg)($l)"
, do case q"$a.this.$_[$b, $c]($arg)($l)" if tree.symbol == <symbol of zeroOrMore>
. Do you know how to obtain the symbol for zeroOrMore
?
Do you know how to obtain the symbol for zeroOrMore?
I'd think something like
typeOf[RuleDSLCombinators].member(newTermName("zeroOrMore"))
right?
However, it seems like the code is not really becoming more readable with moving to a symbol-based match, which is unfortunate but probably necessary.
Another question:
Does the underscore in q"$a.this.$_[$b, $c]($arg)($l)"
now work as a wildcard like that?
It think it needed to be ${_}
rather than $_
at some point in the past, which is why we don't use it yet.
Yes, you're right on both points.
Speaking of ${_}
. This is necessary for 2.10, but in 2.11 you can say just $_
.
Speaking of ${}. This is necessary for 2.10, but in 2.11 you can say just $.
Ah, cool. Didn't know that. Thanks, Eugene!
zeroOrMore(MyParser)
=>MyParser.*
oneOrMore(MyParser)
=>MyParser.+
optional(MyParser)
=>MyParser.?
MyParser.separatedBy(p)
=>MyParser.sep(p)
The amount of horizontal space wasted purely because of these long names is enormous. I can't even abstract over them with normal functions/extension-methods because they're all macro-magic. Even trying to rename them on import doesn't work
It is silly that I spend so much time trying to refactor parsers to keep them on one line, and two or three of these silly combinators are enough to run off the right side of my editor.