kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.6k stars 232 forks source link

Use SPPF-style structure #163

Open tjvr opened 7 years ago

tjvr commented 7 years ago

Switch to using an SSPF-style internal structure, similar to @joshuagrams's pint-sized Earley Parser.

This lets us:

This would also:

tjvr commented 7 years ago

It seems that postprocessors are going to be a blocker for this. :(

tjvr commented 7 years ago

We've made lots of performance gains particularly in #171. I'm no longer convinced SPPF is a massive win here. We'd have to drop support for predicates, and the only real advantage is you get a full parse forest (rather than a list of possible parses, which is the exponential in the number of ambiguities).

What do you think @hardmath123? :-)

kach commented 7 years ago

Hmm, yeah, I suppose so — it might not be worth the effort. But I'd leave the issue open anyway, regardless of what you decide, because someone-in-the-future might want to try it out to see what would happen. :-)

heatherleaf commented 7 years ago

Just crashing the party... I wrote some parsing papers a while ago, perhaps one or two of them could be interesting?

Ljunglöf & Wirén (2010) Syntactic Parsing. Handbook of Natural Language Processing Ljunglöf (2008) Transition-based parsing. Swedish Language Technology Conference Ljunglöf (2003) An Abstract View of Generalized LR Parsing. Nordic Workshop on Programming Theory Ljunglöf (2002) Pure Functional Parsing. Licenciate thesis

You can download all of them from here: http://www.cse.chalmers.se/~peb/bibliography.html

(Sorry for the spamming, but perhaps there's something interesting somewhere there)

kach commented 7 years ago

Hey, thanks for the links! It's not spam at all — it's actually really nice to have clean, academic references for what we're doing.

By the way, Section 4.9 in Ljunglöf & Wirén (2010) Syntactic Parsing. Handbook of Natural Language Processing was especially fascinating. It reminds me of a post by Jeffrey Kegler, over here (x/posted in brief over here).

heatherleaf commented 7 years ago

Thanks for the timeline -- I haven't seen that before.

erezsh commented 7 years ago

Just to chime in with my own 10 cents: I re-implemented Earley in Lark with a parse-forest. It's actually relatively simple to add. The idea is to build a tree structure normally, at the same points that you currently use for callbacks. But when a rule is completed more than once, you need to insert an "ambiguity branch". My implementation is here: https://github.com/erezsh/lark/blob/master/lark/parsers/earley.py I'll be happy to walk you through my solution if it helps.