Awesome stuff! - Githubissues

Hey Patrick! - not sure if thats your name :)

I just wanted to drop a note that I think you've done some awesome work on this library so far! I've been coding some on an Earley parser, same as you - but you've come way ahead of me, so I've decided to drop my efforts all together in favor of your amazing stuff here :)

Recently I started an implementation similar to what I believe you started doing with your latest commit - the "Deep" precomputed "earleysets" (If I'm not mistaken) - that put a nail in the coffin for me ;) - I was struggling quite a bit on this :P - I'm really looking forward to the progress in this! :) - finally I can jump ahead to actually use it for some cool stuff instead :) - I'll probably do some work on tooling around it instead, for quick prototyping of grammars (similar to Antler) :P - If I'll make something worthwhile, I'll drop you a note of course!

Yeh, so, just wanted to jump in, say thanks, and give you some credit on your work :) awesome!

Btw, sorry for polluting your issue tracking - feel free to delete the issue/post once you've read it! :)

Hi @bilsaboob,

I appreciate you reaching out and I'm really glad you like the library. Honestly I wasn't sure if anyone had noticed it as it was mostly just a labor of love :)

I can't take all the credit though, Jeffery Kegler @jeffreykegler and the Marpa community has spent years working on this stuff and I just read the papers he points to and try my best to implement the algorithms. They have a IRC log that I was digging through and a google group that I lurk on.

The deep stuff is still in the works. Don't worry, I beat my head on it too, the paper is really hard to read. From my reading it should increase performance about 40% (though Jeffery points out its far less than expected in the Marpa implementation so I'm not crossing my fingers). I named it Deep based off of the Aycock and Horspool paper Directly Executable Earley Parsing but its more in line now with their Practical Earley Parsing so that name will probably change. The first paper helped me the most with the algorithm and the second paper that followed with the nullable symbols. The key was understanding that predictions are just null transitions in a NFA and scans/completions are Symbol transitions in a NFA.

Precomputing the grammar takes a few milliseconds so I was thinking about making two different parse engines or unifying two implementations behind a facade and exposing the optimization through a setting. That way the user can decide if they want to incur the overhead. Still haven't figured out leo items and parse trees for the Deep earley sets though. I spend most of my time over the past 2 years trying to get the regular earley algorithm working with those things.

Really my goal now is to get the performance improved. Parsing a 10,000 line Json file with the Json grammar is taking 500ms on my i7 and I feel like that is too slow. Newtonsoft can do it in 50 ms :( . I also keep wondering if I should move away from objects and instead favor bit vectors and bit matrices like Marpa.

I've been kind of hung up on performance lately to the determent of new features like error recovery and incremental re parsing.

If you want to contribute or have any suggestions let me know. I figured grammars could be stored as projects in the main solution or as breakout repos. Interested in your opinion. Tooling would be awesome. I originally envisioned this as a DSL equivalent to Roslyn, not sure if that is in line with what you were thinking.

Issues are fine to write this stuff out. I honestly don't know if there is a better forum for comments so I welcome the medium.

Hey,

Well, I figured it would be nice if someone gave you some attention on this :) - I've been reading through most of the blogposts/publications made by Jeffery Kegler @jeffreykegler, which is all great stuff, though your implementation is the first I've been able to follow and understand on the "optimizations".

I know what you mean - call it early optimization or not :) - I actually noticed your project quite some time ago, but just recently got back to it myself - and figured I would give it a try.

I was following the approach of the Directly Executable Earley Parsing publication, building DFAs for each of the NonTerminals (keeping nonterminal transitions as part of the transition tables).

After an initial DFA generation pass I followed up with another pass, replacing the transitions corresponding to an Id of a NonTerminal with scan edges linking to the DFA (the so called "Call edges") - an decorating these transitions as "Predictions", in order to know how to handle the ParseTree generation when following such a Transition.

This seems to work fine as long as I don't end up with "recursion" between the NonTerminals :), at which point I'm no longer sure about how to handle the continuation in case some of the paths end up as a bust - this basically yields a solution similar to the original Earley algorithm, but with less open States per Set.

I ended up trying an approach where I started a new "Parse Path" (something similar to the State objects in your implementation, but with the addition of a DFA to those)

So I was basically pushing the "DFA" of the NonTerminal that caused the "recursion" as a "parallel" "Parse Path" - which ends up as a breadth first solution - this should potentially speed up due to the characteristics of a DFA - however since I haven't gotten it working for recursive rules yet... here I am :)

I don't really have any suggestion on the performance parts - though I think that the biggest performance gain could potentially be by changes in the algorithm from the "dynamic" nature to the precomputed states, as you've started with the Deep implementation - if not in algorithmic complexity, then for sure in the "constant" part of the algorithm.

One drawback I was thinking of is that it may perhaps affect the complexity of adding some good "error handling" / "error recovery" later on? (which could be an issue if you're writing a parser to support an IDE, but isn't really an issue for "one off" parsing situations)

I never really ended up thinking much about Leo items for the DEEP approach, though my feeling was that it may not be needed? - not sure.

Anyway, I shouldn't add any more "unqualified" opinions on performance/algorithm - you are way ahead of me on that topic anyway :)

As a short background - the reason I started out with this to begin with is to use it for a new "language integration" with Visual Studio. I've always wanted to have a mix of Scala / C# syntax, so what I want to achieve is IDE tooling for transpiling this syntax to C# syntax.

As Pliant is for you, it has been an old on and off project of mine :)

Previously I've been using Antler3 for the parser, but I've found making changes to the syntax have been more and more complex recently (most likely due to my lack of experience using Antler - and that it requires building "clean/non ambigious" grammars in order to behave as expected...)

Antler4 is out, which could be a possible alternative - though I figured I would give another go at the old Earley code I had - and then I ended up here again :)

I think keeping the Pliant library as a separate repository would be nice at some point - though I guess it will just make it a hassle for you at the moment - keeping the grammars in the main solution as you have it now is a convenience I would assume - I guess main benefit would be opening up for contributions to the project?

One way would perhaps be to begin by create separate solutions? - one containing the Pliant lib related projects and then one for Grammars (in order to avoid multiple repositories at the moment).

I just created a separate solution when forking Pliant, adding a new project for the "grammar" that I'm going to do some testing on - and whatever helper projects I would need - resulting in the following structure:

pliant - solution folder with the Pliant libs
rapidsharp - solution folder for projects related to "rapid sharp" grammar
- RapidPliant - project where I was planning to add some "convenience utils / helper classes" for working with Grammars
- RapidSharp - project for the grammar definition
workbench - solution folder for projects related to "grammar tools" helping creating and playing around with grammars by syntax similar to the "Antlr grammar syntax"
- GrammarWorkbench & any related projects - the tools for building grammar

Given the current Pliant repository I would get some of the "Grammars" which I'm not interested in... which isn't really a big deal :)

As for the tooling, I was thinking on creating a some sort of "Grammar Workbench" for designing a grammar with syntax similar to "Antlr grammar syntax" per NonTerminal - among other featuring:

Design view for the grammar / splitting grammar into multiple parts that get merged.
Grammar tests view that allows user to add tests for specific NonTerminals by adding code snippets for it
During setup of the initial test - a ParseTree output is generated and displayed on screen.
- User verifies and accepts the ParseTree
If a change to the grammar makes any of the "tests" to fail, an immediate notification is shown
If a change to the grammar makes any of the "ParseTrees" to change, an immediate notification is shown

Writing some Unit tests for my grammar would probably have avoided some of my issues :P - you've done a diligent work on that throughout Pliant :) - though I think this approach better matches how I've been evolving the grammar - I found verifying the parse output with Unit tests not exactly satisfactory from workload perspective :)

As for the Grammar definitions, since working via the "Grammar Workbench tool" - I was thinking more along the line of generating grammar code from the Tool in combination with partial classes - in order to easier support configuration of "error recovery points" and "AST" building. The generated code would be something along the lines of the following pseudo code:

GeneratedGrammarDefinitions.txt

(Sorry, but I wasn't able to paste the "pseudocode" directly ... the styling was getting messed up)

As to contributing to the project, I think a for now I would be most suited in the area of Tooling - and if you would want some opinions on things I'm happy to give you some input :)

Though I don't want to commit to anything, as I'm often On and Off on "coding sprees", and don't want to disappoint in any commitment.

If you want, perhaps you could give me some thoughts you've had on Tooling and the "Roslyn DSL" or any other areas/things you had in mind - and I'll for sure keep that in mind when working on things - and if it happens that I end up developing something that you would find useful I'm more than happy to share it :)

Regards, /Peter

Oh, so did you start on creating a parse forest from the precomputed items? I'm not sure what issues I'll run into yet with recursions. In general I tend to avoid them by using Dijkstra's algorithm (or at least I think its Dijkstra's).

From what I've read of Jeffery's work the leo items are still needed even if you are doing precomputation. Once I get a test recognizer working I'll throw some grammars with right recursions at it to see how it handles them. My first step may be to just replace the core of the IState queuing with precomputed state transitions that spit out State objects into the existing Earley Sets.

I'm curious to see how Antlr4 works for you. I haven't used it much more than for trivial grammars. If I remember correctly they did some rewriting between 3 and 4 to handle ambiguity and error reporting but its been a while since I saw the video where Terence Parr explains the differences.

In your folder structure, have you looked at the NuGet package? I assume you are doing some exploring and debugging of the Pliant source code. With tooling in mind, it may beneficial to users to use the same dll version in the tool as they are consuming in their application. Looking at the antlr repo for grammars they have a single repo where they put the textual representation of the grammar. That seems like better approach than what I was doing.

The grammar workbench sounds like a great idea. Will grammar definition be driven by a Grammar Domain Specific Language or pointing to a precompiled artifact? I created a couple of classes for working with EBNF directly in the Ebnf folder of Pliant if that helps. I'm sure there are still bugs in it because I ran into issues trying to parse the ProtocolBuffersV3.pliant file in the ProtocolBuffers project and then got lost in performance improvement. The goal was to get an AST for the grammar to it could be turned into a Grammar instance.

I think codegen is an area that could use some work. Your example looks good. I never thought to create classes for each grammar rule. I like seeing ideas for error recovery, disambiguation and AST generation, I was going to create a Error token type and a default error behavior. Something like put the expected token back into the parser and append input to the error token until an expected lexer rule reaches input. Not sure if that will work for the general case (or even work at all) so I wanted to provide an user override like you have. Perhaps we could collaborate more on the open issue #42 .

I like the idea of modeling a sample of the language. The Irony project was a big inspiration when doing the GrammarExpression work and I think they had something where you could visualize the parse trees for your grammar. I've seen demos of some of the Antlr tooling and browsed their docs page but not much more than that.

For the DSL project I thought it would be cool to parse a policy language like ALFA for another project I occasionally work on. Also tooling for the EBNF grammar for syntax highlighting and intellisense support in Visual Studio.

Well, I ended up with an algorithm similar to the one you have, except instead of "following" and "expanding" IState objects on predictions - I was using an object similar to the "IState", but containing a DFA behind the scenes which is handling the transitions (but still the same "IState" objects is used... until special case regarding the "recursion" I was talking about, at which point I needed to open a new "IState" path) - lets see if I can clean the code up a bit during the weekend and post the general outline of what I was doing - perhaps it may be of some use... for inspiration if nothing else :P

As to creating a ParseTree - I wasn't really able to test it, since I didn't finish the work on the algorithm... my general thought was to keep an Array for each "Set index" - basically one array for each Position I read a symbol at.

Given the text: "public static class Testing"

Given that the there is an ambiguity allowing for 2 distinct sets of lexed symbols

Given the following lexed symbols in set 1: "public", "static", "class", "Testing"

Given the following lexed symbols in set 2: "pub", "lic", "stat", "ic", "class", "Testing"

There will be an array with 6 internal arrays (one for each "position" we can possibly read a symbol at): [[], [], [], [], [], []]

This array is then filled with: (pushing the "Lexed" symbols at the current parse position...) [["public", "pub"], ["static", "lic"], ["class", "stat"], ["Testing", "ic"], ["class"], ["Testing"]]

(I actually made this as a multidimensional array, [num_parsed_symbols][total_symbols_in_grammar] - :P not very good for memory perspective I guess, but at this moment I didn't really care much about that :P - I was kind of going under the assumption of about max 10mb for this array, which is of course quite alot if it's kept per "file" that is being parsed... and not reused at some point...)

For a scans I pushed a "new symbol" into one of those arrays - with a ParseNode, something like: ` //This is the scenario of a NonTerminal completion, which is more interesting... just copy paste this, I can't seem to get the "code block" feature working in the post!? var currentLocation = X ... var parsedSymbolsArray = [[], [],...] //the symbols that have been parsed so far up until the current location var completedNonTerminal = ... //the non terminal we wan't to "build" var expectedSymbols = completedNonTerminal.Rhs; //the expected symbols for the non terminal - this information is extracted from the information on the "transition edge" var expectedSymbolsCount = completedNonTerminal.Rhs.Length; //The number of symbols expected to be "consumed" as part of this completion

//Now iterate the expected symbols count... and build tne new "ParseNode" var completedParseNode = new ParseNode(completedNonTerminal) for(var i=0; i < expectedSymbolsCount; ++i) var expectedSymbol = expectedSymbols[i]; var symbolsAtExpectedLocation = parsedSymbolsArray[currentLocation - expectedSymbolsCount + i]; var expectedParseNode = symbolsAtExpectedLocation[expectedSymbol.Id] //Set the expected parse node... completedParseNode.SetRhsParseNodeAtIndex(expectedParseNode, i) } `

As I said, this may be a completely stupid idea which I haven't been able to test yet - but since you asked... - Perhaps I'll try to clean up my code in the weekend, and "finish" some basic sample to test it out and have some more concrete results to share - I'll try to add some comments and post it here if you would like to have a look :P

Anyway, I'll be back and continue with a follow up post in a couple of min ... got to get something to eat :)

Hey again,

Yep, as you say I've just forked your repository to get better browsing of the source code and debugging purpose - so I did#'t really use the Nuget package and the version of the library distributed in there.

I think making along the lines of Antlr for the Grammar distribution would be a good idea at some point, as you say - having one separate repo for the different grammars, or even one per grammar and additional Nuget package, though I usually find it a bit of a hassle managing repos here and there :P myself - and a for other developers wanting to use Pliant, I don't think anyone will mind having to download the whole repo including "everything" as long as there is a "core solution" and a separate one for "grammar samples".

As for the tooling, you are absolutely right - using the same version as in the Nuget package would be a good start :) - though I was thinking more along the lines that whenever I'm finished with anything useful there, I'll just give you a ping and you can pull whichever parts you would like back to your repo and distribute it in any way you want of course.

Otherwise, if you're really thinking about making things "easy to consume" and implementation agnostic for users - perhaps one could extract most of the "common classes" into interfaces and put into a "Pliant.Lib" project and separate the implementation into a "Pliant.Runtime" project?

The Pliant.Lib would be the one users reference, and they can then pick whichever "Pliant.Runtime" distribution they want - as long as the Pliant.Lib infrastructure classes/interfaces don't change, it should be easy for others to consume - and in case of "Tooling projects" and other community contributions, these would preferably work with the Pliant.Lib - giving you the "free reign" on optimization changes and other "implementation refactorings" without breaking anything (as long as you don't change Pliant.Lib)

This solution would probably need some sort of "DI", registering the "interfaces" (IParser and any other "Services" exposed) - which then would be bound by the Bootstrapping of whichever Pliant.Runtime library chosen - this will kind of start "blowing up" into a more "enterprise architecture"... I think one of the nice things of your code so far is that it's really easy to follow - and "down to business", it does the parsing and does it well :)

And hey yet again :)

As for the grammar workbench I was thinking of having a "custom BNF like" syntax, basically I like the syntax of Antlr - it's really easy and intuitive to write it, except that I would like to be able to write it a bit more "modular" and combine it all together.

I've actually already written a small handrolled parser for the syntax, which is more or less as follows:

`/* This is a multi line comment for ClassDecl */ ClassDecl: //This is a single line comment 'public'? 'static'? 'class' identifier '{' ClassBody '}'

ClassBody: (MethodDecl | TypeDecl | PropertyDecl)*

Identifier: [a-zA-Z_-0-9]*`

I was thinking of something like the image beneath here - this is what I started on based on my own version of "Rules/Productions/Expressions etc..." - I would probably just start from scratch with the Pliant library classes and such... in order to avoid having to do any "ugly glue code" :P

Working on this I would probably also switch the handrolled parser with a grammar built on the Pliant parser :) - in order to get a better feel of it and perhaps I can give you feedback on which areas I find could be refactored to support better "IDE tooling" features :)

I think as you're saying on the error recovery feature is a relatively difficult feature to handle in a "generic way" and simultaneously yielding satisfactory result in every situation - I guess this is very dependant on the grammar itself in many cases? - for one grammar a "skip ahead until next X NonTerminal" would be satisfactory while for another grammar something else would be more suitable - however the strategy you mentioned is probably good enough as a generic strategy? given that there is an option for users to "override" per rule basis?

I never got to the part of error recovery myself - however I was thinking of implementing a "skip ahead strategy" - probably something similar to what you say - I don't see why it shouldn't work ok the way you say?:

if expected token is X and found Y instead
- then create a "X" token and "mark it as an error" - then continue consuming into this token until Y is successfully lexed/parsed.
- the X token is still seen as a "X" ... and can be used to "potentially" continue parsing as if everything went ok... and "sub parse tress" can still be built... would useful for IDE syntax highlighting and such?...

For the Tooling I would probably just want to "mark everything in Rule" as an error and skip ahead to the next "Rule" NonTerminal - this would require to have an option to override the strategy per NonTerminal in the grammar.

In Antlr there is a way to specify a "break point" in the Rule definition - if parsing has passed "this point", then if an error occurs, skip ahead to the next X token and treat the rule as "OK" - if parsing breaks before this point, then treat the whole rule as a "bust" - I was thinking of doing something similar for the grammar workbench, but separate this into another window... in order to keep the "grammar expression" clean and readable. (this would mean the user has to click with the mouse on a specific NonTerminal in the design window... then a sidebar would show options regarding error recovery and other convigurations such as "naming" for the AST building phase)

The reason I was thinking of generating separate classes for the rules, is for the configuration options but also to "bind" AST classes on "per rule class" basis - basically achieving something similar to how Antlr builds the AST with the visitor.

I haven't really thought much in the way of integrating with Visual Studio - since I was thinking of having the workflow around the grammar workbench :P - though customization and more "advanced" configurations per Rule basis would require "overrides" in code.

For this I was thinking of generating classes as "Partials", allowing custom overrides in the partial definitions - and letting the tool "generate new code at will".

And in order to support reloading the "customized grammar" for the "tests", listening for "file changes" from a configured path - and upon any changes in Visual Studio, basically reload the "libraries" (the grammar lib that is being worked on - of course this requires the user to make a new "build" from within visual studio)

I've looked into the Irony project some time ago as well :) - but due to the lack of "language deign" skills, I was never playing nice with the limited "lookahead" of the framework :P - that's what I find nice about the Earley algorithm - it's so much easier to define a grammar without having to focus so much attention on refactoring it into a "non intuitive structure" in order to support the tooling.

Lets see how my weekend plans look - I'm feeling more and more like just barricading myself indoors with a can of coffee :) - I'll keep you posted on what I'll start on - most likely I'll just dump whatever I've been implementing and start from scratch with Pliant as the base library to build around :)

A little tangent: I noticed you had multiple symbols to add for lexing ambiguity.

Given the following lexed symbols in set 1: "public", "static", "class", "Testing"

Given the following lexed symbols in set 2: "pub", "lic", "stat", "ic", "class", "Testing"

Is that something that you would like as a feature? As far as I'm aware, the current implementation only accepts a single token before advancing and the ParseRunner will try to only match the longest accepted. The core algorithm supports multiple tokens being accepted for a single parse state. I could add a method to the parser to push tokens into the engine and make a separate "advance" step that parses with the given tokens.

Marpa does something similar. I could also modify the ParseRunner to push tokens into the parser as they are recognized instead of letting the longest acceptable match win. I may have to capture what Earley set the token was first found and re-run the parser on just the new states.

I need to look over the code samples you provided a bit more and try them out. It takes me a while to reason out pieces of the algorithm and I usually need draw examples on a scratch pad. There are no stupid ideas, just testable hypotheses :).

I've thought about layering the architecture to avoid coupling. I tried my best to keep circular dependencies out of the folder structures but failed with the Forest <-> Charts namespaces. I have a issue open to go back to refactor that. Assuming I break out these into different projects I could use something like ILMerge to recombine them or just leave them as different packages for NuGet to resolve.

I do a fair amount of enterprise design, so I know what you mean about DI. The current state is very organic, though I tried to avoid static and use constructors that accept their dependencies whenever possible. I think that will make it very easy to create a DI layer. There are obvious places where I've injected a concrete implementation into a constructor and I would remove those to make a call out to a DI framework abstract IDependencyResolver.Resolve() type method. I've added a issue to track that.

With the breakout of assemblies, I would have to refactor significantly to inject a new parsing algorithm so I can add that to the backlog. I've often thought of writing a GLL or GLR parser to see if I can handle large file cases. Earley is great and fast, but the chart holds a lot of state. The sparsley packed parse forest is the same in GLR but I'd have to remove the hooks to the charts there and in the IParseEngine. Using a parse forest for a very large file doesn't make much sense anyway because of the memory consumption so I'd probably do something like the Begin/End states of XmlReader.

Your UI looks good. For the error, I assume you are doing a semantic analysis pass on the parse tree or forest to know that a Identifier isn't defined? As you come up with error cases, we can track them to make sure we have enough data for a generic case (if its possible). I definitely agree on the overriding or custom behavior part. I'll check out the syntax of Antlr. My goal was to keep it as close to EBNF or BNF as possible, but both Marpa and Antlr have additional overrides for things like error handling and event execution. Maybe a parser of Antlr grammars would work too?

For the reloading, are you just reparsing the file? I found an article on incrementally reparsing with Earley. I haven't had a chance to grok it yet, but from what I read I think you get the location of the edit and jump to the corresponding Earley set and continue the parse from there. It could be either a clear of the parse sets or some kind of fork and merge. A fork and merge would be more performant if you were, say, changing a variable name or something that parses but isn't semantically correct.

Don't take too much of your weekend away :) I say that but I know I'll be thinking about this the whole time :)

Hey,

Well, lets do like this - I'll start doing some coding over the weekend, and hopefully get a better feel of what potential changes would be useful from Tooling perspective?

While doing my implementation I didn't always have a clear perspective of what I needed or not - more often than not I've developed things just because it was more fun/interesting or just boosted my "ego" while doing it a certain way :P - so perhaps lets get back to this once I've come along a bit with the workbench.

Ok, well if you don't mind restructuring and adding a bit more "heavy framework oriented" feeling to the library (including the DI and such) - then perhaps I can pitch in on this area while implementing the workbench? - I've updated your other issue with some suggestion on project structure and such :)

For the sample I showed I'm just reparsing the whole grammar input text on every change (a bit naive implementation :P) - as you say I've also been thinking on the incremental parsing - and would for sure be an awesome feature from IDE perspective.

I've had some thoughts on overall linking the parsing to the AST and then to the "language context evaluation" as well - similary to how we parse a language, make some similar "grammar input" for specifying the "language rules" - but instead of parsing "lexed symbols", one would "evaluate" AST according to a "language context grammar" - and having an incremental reparsing, would probably allow this mechanism to hook into reanalyzing only the "replaced parts" of the "ParseForest" and as such indirectly the generated AST.

Bah, what else is more important than coding in the weekend ;) - anyway, I'll wait for some input on how you would like me to contribute while working on the grammar tool... and if you have any preferences on how I should structure the repo / projects within... just give me a shout.

I've followed this, but when my comments are things I think I'll want to refer back to, or that others might care about, I try to make them in places where they are less likely to get lost. Inspired by this exchange, I'm drafting a new Marpa FAQ: http://fpaste.scsys.co.uk/532983

On Thu, Sep 1, 2016 at 12:29 PM, bilsaboob notifications@github.com wrote:

Hey,

Well, lets do like this - I'll start doing some coding over the weekend, and hopefully get a better feel of what potential changes would be useful from Tooling perspective?

While doing my implementation I didn't always have a clear perspective of what I needed or not - more often than not I've developed things just because it was more fun/interesting or just boosted my "ego" while doing it a certain way :P - so perhaps lets get back to this once I've come along a bit with the workbench.

Ok, well if you don't mind restructuring and adding a bit more "heavy framework oriented" feeling to the library (including the DI and such) - then perhaps I can pitch in on this area while implementing the workbench?

I've updated your other issue with some suggestion on project structure and such :)

For the sample I showed I'm just reparsing the whole grammar input text on every change (a bit naive implementation :P) - as you say I've also been thinking on the incremental parsing - and would for sure be an awesome feature from IDE perspective.

I've had some thoughts on overall linking the parsing to the AST and then to the "language context evaluation" as well - similary to how we parse a language, make some similar "grammar input" for specifying the "language rules" - but instead of parsing "lexed symbols", one would "evaluate" AST according to a "language context grammar" - and having an incremental reparsing, would probably allow this mechanism to hook into reanalyzing only the "replaced parts" of the "ParseForest" and as such indirectly the generated AST.

Bah, what else is more important than coding in the weekend ;) - anyway, I'll wait for some input on how you would like me to contribute while working on the grammar tool... and if you have any preferences on how I should structure the repo / projects within... just give me a shout.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/patrickhuber/Pliant/issues/59#issuecomment-244186177, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKmoaqoqHoz5iMysQV7pZfFLtII0QqMks5qlyekgaJpZM4Jw1hq .

@jeffreykegler Good read, this pretty much follows the progression I went through. I think my order was (with parse forest imlementation)

Base Earley
A & H null fix
Leo optimization
Scott SPPF trees
Adapted SPPF tree with virtual nodes for Leo state unraveling
Grammar precomputation

From what I've read in your IRC log Marpa still does a significant amount of grammar precomputation. Some of that precomputation is to find nullable symbols and some is to precompute predictions. May be worthwhile to mention those and the order they are done/benefits to complexity as well if they are still part of the algorithm.

I'm typing this on my phone, so have to keep it brief. I'll jump on IRC and ask some questions when I get to my workstation.

patrickhuber / Pliant

Awesome stuff! #59