picoe / Eto.Parse

Recursive descent LL(k) parser for .NET with Fluent API, BNF, EBNF and Gold Grammars
MIT License
148 stars 30 forks source link

Support run time start rule. #7

Closed thargy closed 10 years ago

thargy commented 10 years ago

What's the chance of allowing the start rule to be selected at run time? For example when using an XML grammar, you may want to start at 'document' or 'node', or in a language you might want to pick 'type' or 'expression', etc. Recompiling the grammar with different start rules seems highly inefficient?

This would also solve a problem with auto-code generation, at the moment BNF and EBNF do not specify a start rule as part of the syntax, being able to pick the start rule after the fact would allow these grammars to be converted to fluent code with the start rule being specified at runtime.

cwensley commented 10 years ago

The start rule can be set at run time by the Grammar.Inner property, though we don't keep a list of rules parsed for BNF/EBNF. This can possibly be accomplished by creating a new Grammar subclass that keeps all rules in a dictionary (instead of keeping track in the BNF/EBNF grammar parsers).

Ideally though, the BNF/EBNF parsers should be able to set these through the grammar itself, as you suggested in issue #6.

I do not think being able to change the start rule (after already selecting one initially) is a good idea though, mainly because it will have to detect left recursion and perform other startup optimizations each time it is changed (which are sometimes destructive to the original structure).

thargy commented 10 years ago

I suspected this was the problem. We've completed quite a bit of testing and have decided to use BSNGrammar, I made a start on the build project and you can find the work done so far in my fork. Unfortunately, we're up against some tight deadlines and now that we've decided to revert to BSNGrammar I don't have time to finish the work for you. If I get time in future I'll return to the task, otherwise you're welcome to use what I've done as a starting point if you're interested in the approach.

Creating the install.ps1/uninstall.ps1 combo is relatively easy. I believe the PostSharp nuget has a good example, though there's a lot of examples that support adding a target (which I've mostly done for you).

Again, sorry that I have to move onto something else for now.

In case you're wondering, the main reason is that your parser starts to get slower in comparison as filesize gets bigger (we parse 10MB+ files, with 500k lines). Also, when things go wrong the error handling and debug experience is less refined than we currently get with the Gold build engine. Also, converting LALR -> LL(*) gives poor performant grammars, BSNGrammar runs the same XML grammar approx. 160x quicker. Obviously, we could create an XML grammar for Eto.Parse but again, comes down to ease of use of the toolsets.

Finally, the performance differences you quote aren't entirely comparable when it comes to building complex structures for evaluation from the resultant outputs of engines. We find that BSN allows us to get there quicker.

All said, really good project and very interesting, I'll be keeping an eye on it as it develops. Thanks for your time.

cwensley commented 10 years ago

Thanks for the work on the build project, I'll take a look.

I am just about finished an xml parser sample, which is about 6.3x faster than the bsn gold parser with the xml grammar from here, though it is still 4.7x slower than using System.Xml. If it's performance you're after for xml, System.Xml is probably your best choice.

Using BNF/EBNF/Gold grammars will never garner the best performance, as they are general purpose in nature. The only way to get better performance is to write code and use specific purpose parsers, which is what eto.parse is (mainly) about.