scala / slip

obsolete — archival use only
67 stars 15 forks source link

scala-parser-combinators vs fastparse #24

Closed SethTisue closed 7 years ago

SethTisue commented 8 years ago

the following exchange between @odersky and @lihaoyi occurred on scala-debate recently:

@odersky Parser combinators is the same thing. It would be great to have a separate module which improves on the current state.

@lihaoyi If you really are interested, such a thing exists http://lihaoyi.github.io/fastparse/ It's almost drop-in except a lot better to use. Not to mention 4-10x slowdown vs hand-rolled instead of the scala-parser-combinator's 400x

@odersky It would be great to see fastparse as a module in the stdlib! If you want to contribute it that would be much appreciated. The best way to get things rolling is via Slip: https://github.com/scala/slip

some further discussion on Gitter:

@soc I'm not sure I understood his plan. Is it replacing the actual implementation of scala.util.combinators? Is it getting rid of scala.util.combinators "preferred" status in terms of being an official module?

@SethTisue I don’t know either

@tpolecat you may register my vote for no preferred parser combinator library

@odersky I would like to encourage contributions, of whatever form. There's no plan yet. It depends where people want to take it. Anybody who takes initiative has my vote.

@SethTisue I’m definitely interested in discussing it. if Haoyi’s stuff is strictly superior to the old library in every important respect (Haoyi certainly seems to think it is! and if some interested folks could confirm that, what great news that would be), then scala-parser-combinators could be mothballed. although scala-parser-combinators is community-maintained these days, the Scala maintainers still have an interest in its status given that scala-parser-combinators is in the scala package and under the scala org on GitHub. and in the Artima book.

@odersky Who's maintaining scala-parser-combinators? It would be good to get their input as well.

@SethTisue Antoine Gourlay (@gourlaysama)

Ichoran commented 8 years ago

I would really like for fastparse to be part of the standard distribution (even if in a module and not the standard library). It's the kind of tool--often needed, hard to get right yourself, saves a ton of time if it's there--that is really a win for a standard library. I ended up creating my own parsing library focused on extreme performance (comparable to hand-rolled in many cases). Even though it's a bit fiddly to use, I use it all over the place. I'd use fastparse instead probably 80% of the time if there was no extra dependency to pull in and a more-certain maintenance future.

SethTisue commented 8 years ago

a significant past discussion on this (started by @soc in November 2014): https://groups.google.com/d/msg/scala-internals/4N-uK5YOtKI/teWdFXR290oJ

puffnfresh commented 8 years ago

The standard distribution should not include a parser combinator library.

dickwall commented 8 years ago

Could have both the old one and a new one in parallel for while.

Currently parser-combinators are already a separate module in the core library, also they are primarily community maintained.

lihaoyi commented 8 years ago

I've written up a section on Debugging Parsers in the FastParse documentation. If you actually care about parser combinators you should check it out, because this is one facet where FastParse is probably best-in-industry AFAICT.

soc commented 8 years ago

I'd prefer dropping scala-parser-combinators, and writing some migration document pointing to fastparse. (Pretty much similar to what happened with scala-actors vs. akka.)

puffnfresh commented 8 years ago

@soc :+1:

lihaoyi commented 8 years ago

I'd prefer dropping scala-parser-combinators, and writing some migration document pointing to fastparse. (Pretty much similar to what happened with scala-actors vs. akka.)

I'd happy to give someone from The Powers That Be™ collaborator access to FastParse, if there's someone I trust actually cares about things.

So far, I've seen zero interest: no questions, no issues raised, no discussions on gitter, no PRs sent. There are people using it, but as far as I can tell nobody involved in this discussion has even downloaded the thing at all or read any documentation (not that there's any shortage of it).

It's probably a bit premature to talk about how we can FastParse the Official™ parser-combinator library =D

Ichoran commented 8 years ago

Once I finish wading through writing least squares fitting routines I'll need to parse stuff, and I'll use fastparse. I will probably complain bitterly that ~! is nothing like ! and thus is poorly chosen notation. Besides, it's a cut. Don't you want scissors notation? :<?

lihaoyi commented 8 years ago

I will probably complain bitterly that ~! is nothing like ! and thus is poorly chosen notation.

I just copied that from scala-parser-combinators! I had no active decision in picking it, I also haven't really seen Cut used much on the internet (despite how important I think they are) and I can't find any other examples of it used except in scala-parser-combinators and MacroPEG (which is again my own). After I managed to convince @sirthias to put cuts into Parboiled2, he chose ~!~ which IMHO is not much different.

As far as the name blame game goes, it seems to go to 2008 @adriaanm

screenshot 2015-10-12 21 32 01

(github doesn't do recursive blame AFAIK, so sorry @adriaanm if you are being wrongly blamed)

I don't think scissors have enough cuttiness in them; perhaps a sword would be more appropriate

p1 =||====> p2 | p3 =||====> p4
Ichoran commented 8 years ago

I still think the notation could stand re-examination. I am not sure I want to sever my parsing so severely as the sword suggests. I guess ~!~ is okay. It's pretty easy to type, at least. I'd have just ~~'ed if it were me. Maybe not obvious enough? I'll play with it a bit once I start parsing things again.

lihaoyi commented 8 years ago

Here's my talk at SF Scala about FastParse

https://vimeo.com/142341803

Ichoran commented 8 years ago

I've been using FastParse for most of the day, which is the longest contiguous block of time and most intensively I've used it. I can confirm my initial hunch that it's way better than parser combinators.

I hate the ~! notation for cuts. It's too hard to glance at code and know what's a ! and what's a !p and what's a ~!. Worse yet, you can typo ~ !"foo" as ~! "foo" and get no error while exactly flipping what your parsing is supposed to be. Argh!

Other than that, everything's delightful. (Well. Pretty much. I'd make a dozen or so changes to suit my preferences, but none of those are a big deal.)

jducoeur commented 8 years ago

Chiming in: I've been using FastParse in production for a couple of months now (eg, for this MySQL parser) and would say that, while I suspect that we'll find ways to enhance it over time, it's already a good deal more pleasant to use than parser combinators -- generally easier to use, a bit easier to debug (although I'm pondering how we might improve that), and bat-out-of-hell fast.

That said, I don't actually feel any crying need for it to be in the standard library. This is one of those areas where letting the proverbial thousand flowers bloom is useful. Putting a pointer to it from somewhere in the core documentation, sure, so it's easy to find, but I don't think it requires a particularly privileged position.

In general, I like the way the Scala.js community has handled this sort of thing: useful libraries get linked from the homepage, but they aren't especially privileged. The community gets to figure out what works best, and the best rises to the top through GitHub stars, links from tutorials, and like that.

SethTisue commented 8 years ago

It might be enough to add a paragraph to the scala-parser-combinators README, and perhaps to relevant locations on the Scala website and/or Scala doc website, that basically says:

book authors (e.g. of the Odersky, Venners & Spoon book) could make their own decisions about what to say about this, and which library to present, when they publish their next editions, and/or in their online errata

SethTisue commented 8 years ago

additionally, we could consider unbundling scala-parser-combinators from 2.12 or 2.13 (presumably unbundling scala-swing as well). I didn't participate in the discussions at the time they were modularized, but I assume that the primary reason they are still bundled is that they are used in the Odersky, Venners & Spoon book.

unbundling wouldn't affect sbt users (who already must explicitly depend on Scala modules such as scala-xml and scala-parser-combinators), but would affect users using command-line scala and scalac

jducoeur commented 8 years ago

Yeah, unbundling parser-combinators seems appropriate at this point -- it doesn't seem to be getting much love, and I don't think it really needs the privileged position either. (Mind, I'm still using it on both the JVM and JS sides. But my new code is all FastParse, and I'm eventually going to rewrite the remaining parser-combinator parser that way.)

lihaoyi commented 8 years ago

@ichoran syntax is trivial; you could even change it yourself in your code without touching the library, though I won't claim the status quo is perfect. Semantic versioning says during 0.2.x I can replace the library with a turnip and still be compliant

@SethTisue I'd be happy with that outcome.

dickwall commented 8 years ago

Does anyone feel like taking point on this issue to keep it moving forward?

gourlaysama commented 8 years ago

@SethTisue I am completely in favor of unbundling parser-combinators as a mean to say "scala-parser-combinators is not the preferred parsing library in scala".

But there is another reason why it is bundled with scala: it is used by scaladoc. The json parser is used to generate the index of packages and classes for the scaladoc left pane. And since there are no standard json library (wink wink nudge nudge...), there isn't much of a choice here...

And the standard distribution doesn't differentiate between artifacts that are "just dependencies" of stuff and those that are "officially part of the beginner experience" (like akka-actors were): both end up on the REPL classpath (actually, everything ends on the classpath: the bash runner indiscriminately adds all jars it can find; I tried to tackled that once but gave up in the end...).

The second part of the modularization effort would solve the problem I think:

  1. scaladoc would be unbundled from the compiler jar, it would still have parser-combinators in its dependencies (or anything else, really)
  2. the REPL classpath would include the library, the compiler and its dependencies, and whatever library are considered "blessed", but it would have no reason to include scaladoc or its dependencies
  3. Profit...
  4. One day, or in parallel, rewrite scaladoc to use the new official json thingy...
Ichoran commented 8 years ago

@dickwall - I'll take point on this. At the same time, I'm planning to propose that we adopt jawn-parser for the standard JSON parser and use it to generate a standard parse into json4s-ast. That should cover @gourlaysama's point about scaladoc needing parsing.

SethTisue commented 8 years ago

@gourlaysama thanks for the reminder about Scaladoc's dependency on util.parsing. I had indeed forgotten.

@Ichoran but it appears to me that Scaladoc isn't parsing JSON, just generating it. see https://github.com/scala/scala/blob/2.12.x/src/scaladoc/scala/tools/nsc/doc/html/page/IndexScript.scala . so unless I'm missing something, cutting the dependency looks quite easy

Ichoran commented 8 years ago

@SethTisue - That makes it easier to put off doing anything with jawn-parser. (Not sure that is a good thing?)

SethTisue commented 8 years ago

@Ichoran well even if we had jawn-parser as a standard module, we would still want to avoid depending on it from scaladoc, since any Scala dependency that scaladoc has complicates bootstrapping

Ichoran commented 8 years ago

@SethTisue - What does Scaladoc have to do with bootstrapping? It doesn't generate anything required during execution of the compiler, does it? Seems like you just do whatever you do, rebuilding whatever else you need to rebuild, and then run Scaladoc at the end. If you mean, "but the standard module breaking would break the build", well, yeah? Isn't that what we'd want anyway if the standard module broke? Fix that first. Then Scaladoc is up again.

adriaanm commented 8 years ago

Scaladoc is not directly involved during bootstrap, as defined by and strap.done (all you need is lib/reflect/comp), but its dependency list should shrink -- not grow. It doesn't look like it needs to parse JSON -- it's just generating it (as well as html, which it uses scala-xml for). Pragmatically, I think we should considering using string interpolation to cut the cord in both instances.

In a larger sense, scaladoc is of course part of the bootstrap (i.e., release) process.

SethTisue commented 8 years ago

thanks @adriaanm for clearing up my poor wording.

SethTisue commented 8 years ago

(this stuff about the dependency of scalac/scaladoc on scala-xml and scala-parser-combinators is now https://issues.scala-lang.org/browse/SI-9560, let's have any further discussion there since it's somewhat tangential to the main purpose of this SLIP)

lihaoyi commented 8 years ago

I think someone else would have to take the lead on getting FastParse "in", whatever that means. I'm happy for @Ichoran to do it, or if anyone else wants to pick up this cup. It feels to me like if FastParse isn't good enough that someone non-me is excited about it, it shouldn't go in the standard anyway.

So I'll just make FastParse the best I can and you guys can decide whether it's worth your time to do something about it. I'll certainly co-operate if something is asked of me, but I'm pretty happy to not be the one driving things =)

soc commented 8 years ago

The scaladoc -> parser-combinators dependency is on its way out: https://github.com/scala/scala/pull/4851

odersky commented 8 years ago

On Wed, Nov 18, 2015 at 8:27 AM, Li Haoyi notifications@github.com wrote:

I think someone else would have to take the lead on getting FastParse "in", whatever that means. I'm happy for @Ichoran https://github.com/Ichoran to do it, or if anyone else wants to pick up this cup. It feels to me like if FastParse isn't good enough that someone non-me is excited about it, it shouldn't go in the standard anyway.

So I'll just make FastParse the best I can and you guys can decide whether it's worth your time to do something about it. I'll certainly co-operate if something is asked of me, but I'm pretty happy to not be the one driving things =)

I completely understand, and don't want to convince you to change your mind. But you see, that's the general problem we are facing. We do not have people in the wings to pick up a great prototype and stabilize it and maintain it. If something should improve it has to come from the community. So, if someone wants to pick this up and move it forward it that would be great! If not, things will simply stay as they are.

Reply to this email directly or view it on GitHub https://github.com/scala/slip/issues/24#issuecomment-157631042.

Martin Odersky EPFL and Typesafe

Ichoran commented 8 years ago

I will pick it up (while wearing my "member of the community" hat).

SethTisue commented 7 years ago

This discussion could be revived under the new Scala Platform Process (http://www.scala-lang.org/blog/2016/11/28/spp.html).