patrickhuber / Pliant

MIT License
26 stars 4 forks source link

Pliant

Implementation of a modified Earley parser in C# inspired by the Marpa Parser project.

Build Status

build

Gitter Chat

Gitter

Description

Pliant is a table driven parser that implements the Earley algorithm. Two optimizations are added to handle issues with the original Earley implementation:

  1. Optimization from Joop Leo to efficiently handle right recursions.
  2. Bug fix from Aycock and Horspool to handle nullable predictions

Using the Code

Getting Started

Add a reference to the Pliant libarary using NuGet

PM> Install-Package Pliant

Creating Grammars

Using the Grammar Expression classes

using Pliant.Builders;
using Pliant.Builders.Expressions;
using Pliant.Grammars;
using Pliant.Automata;

public static int Main(string[] args)
{
    var digits = CreateDigitLexerRule();
    var whitespace = CreateWhitespaceLexerRule();

    ProductionExpression
        Calculator = "Calculator",
        Factor = "Factor",
        Term = "Term",
        Expression = "Expression",
        Number = "Number";

    Calculator.Rule 
        = Expression ;

    Expression.Rule
        = Expression + '+' + Term 
        | Term ;

    Term.Rule 
        = Term + '*' + Factor
        | Factor ;

    Factor.Rule 
        = Number ;

    Number.Rule
        = digits;

    var grammar = new GrammarExpression(
        Calculator, 
        new []{ Calculator, Factor, Term, Expression, Number }, 
        new []{ new LexerRuleModel(whitespace) })
    .ToGrammar();   

    // TODO: Use the grammar in a parse.
}

private static BaseLexerRule CreateDigitLexerRule()
{
    var startState = new DfaState();
    var finalState = new DfaState(true);
    var digitTransition = new DfaTransition(new DigitTerminal(), finalState);
    startState.AddTransition(digitTransition);
    finalState.AddTransition(digitTransition);
    return new DfaLexerRule(startState, "[\\d]+");
}

private static ILexerRule CreateWhitespaceLexerRule()
{
    var whitespaceTerminal = new WhitespaceTerminal();
    var startState = new DfaState();
    var finalState = new DfaState(true);
    var whitespaceTransition = new DfaTransition(whitespaceTerminal, finalState);
    startState.AddTransition(whitespaceTransition);
    finalState.AddTransition(whitespaceTransition);
    return new DfaLexerRule(startState, new TokenType("[\\s]+"));   
}       

Using Pliant Definition Language (PDL)

calculator.pdl

Calculator 
    = Expression;

Expression 
    = Expression '+' Term
    | Term;

Term 
    = Term '*' Factor
    | Factor;

Factor 
    = Number ;

Number 
    = Digits;

Digits ~ /[0-9]+/ ;
Whitespace ~ /[\s]+/ ;

:start = Calculator;
:ignore = Whitespace;

Program.cs


using Pliant.Languages.Pdl;
using Pliant.Runtime;
using System;
using System.IO;

namespace PdlExample
{
    public static int Main (string[] args)
    {
        // load the grammar definition file (make sure to copy it to output directory if debugging)
        var path = Path.Combine(Directory.GetCurrentDirectory(), "calculator.pdl");
        var content = File.ReadAllText(path);

        // parse the grammar definition file
        var pdlParser = new PdlParser();
        var definition = pdlParser.Parse(content);

        // create the grammar, parser and scanner for our calculator language
        var grammar = new PdlGrammarGenerator().Generate(definition);
        var parser = new ParseEngine(grammar);

        var calculatorInput = "5+30 * 2 + 1";
        var scanner = new ParseRunner(parser, calculatorInput);

        // run the scanner
        if(!scanner.RunToEnd())
        {
            Console.WriteLine($"error parsing at line {scanner.Line + 1} column {scanner.Column}");
        }

        // parse the calculator input
        var rootNode = parser.GetParseForestRootNode();

        // TODO: use the parse node in the calculator interpreter
    }
}

Recognizing and Parse Trees

Using ParseEngine, ParseRunner and Grammars to Recognize Input

Using the calculator grammar from above, we can parse input by constructing a parse engine and parse runner instance.

var input = "1 + 1 * 3 + 2";

// use the calculator grammar from above
var parseEngine = new ParseEngine(grammar);

// use the parse runner to query the parse engine for state
// and use that state to select lexer rules.
var parseRunner = new ParseRunner(parseEngine, input);

// when a parse is recognized, the parse engine is allowed to move
// forward and continues to accept symbols. 
var recognized = false;
var errorPosition = 0;
while(!parseRunner.EndOfStream())
{
    recognized = parseRunner.Read();
    if(!recognized)
    {   
        errorPosition = parseRunner.Position;
        break;
    }
}

// For a parse to be accepted, all parse rules are completed and a trace
// has been made back to the parse root.
// A parse must be recognized in order for acceptance to have meaning.
var accepted = false;
if(recognized)
{
    accepted = parseRunner.ParseEngine.IsAccepted();
    if(!accepted)
        errorPosition = parseRunner.Position;
}
Console.WriteLine($"Recognized: {recognized}, Accepted: {accepted}");
if(!recognized || !accepted)
    Console.Error.WriteLine($"Error at position {errorPosition}");

Using ParseEngine, ParseRunner, Forest API and Grammars to build a parse tree.

The process for creating a parse tree is the same as recognizing input. In fact, when running the ParseEngine, a Sparsley Packed Parse Forest (SPPF) is created in the background. The parse forest is presented in a specialized format to promote density and allow for computational complexity similar to that of running the recognizer alone.

The easiest way to use the parse forest is use a internal node tree visitor on the parse forest root with a SelectFirstChildDisambiguationAlgorithm instance controling disambiguation of thet forest.

If the parse is ambiguous, you may want to supply a custom IForestDisambiguationAlgorithm.

// get the parse forest root from the parse engine
var parseForestRoot = parseEngine.GetParseForestRootNode();

// create a internal tree node and supply the disambiguation algorithm for tree traversal.
var parseTree = new InternalTreeNode(
    parseForestRoot,
    new SelectFirstChildDisambiguationAlgorithm());

References