microsoft / tolerant-php-parser

An early-stage PHP parser designed for IDE usage scenarios.
MIT License
879 stars 80 forks source link
ast error-tolerant fast fully-representative memory-efficient parser php

Tolerant PHP Parser

CI

This is an early-stage PHP parser designed, from the beginning, for IDE usage scenarios (see Design Goals for more details). There is still a ton of work to be done, so at this point, this repo mostly serves as an experiment and the start of a conversation.

image

This is the v0.1 branch, which changes data structures to support syntax added after the initial 0.0.x release line.

Get Started

After you've configured your machine, you can use the parser to generate and work with the Abstract Syntax Tree (AST) via a friendly API.

<?php
// Autoload required classes
require __DIR__ . "/vendor/autoload.php";

use Microsoft\PhpParser\{DiagnosticsProvider, Node, Parser, PositionUtilities};

// Instantiate new parser instance
$parser = new Parser();

// Return and print an AST from string contents
$astNode = $parser->parseSourceFile('<?php /* comment */ echo "hi!"');
var_dump($astNode);

// Gets and prints errors from AST Node. The parser handles errors gracefully,
// so it can be used in IDE usage scenarios (where code is often incomplete).
$errors = DiagnosticsProvider::getDiagnostics($astNode);
var_dump($errors);

// Traverse all Node descendants of $astNode
foreach ($astNode->getDescendantNodes() as $descendant) {
    if ($descendant instanceof Node\StringLiteral) {
        // Print the Node text (without whitespace or comments)
        var_dump($descendant->getText());

        // All Nodes link back to their parents, so it's easy to navigate the tree.
        $grandParent = $descendant->getParent()->getParent();
        var_dump($grandParent->getNodeKindName());

        // The AST is fully-representative, and round-trippable to the original source.
        // This enables consumers to build reliable formatting and refactoring tools.
        var_dump($grandParent->getLeadingCommentAndWhitespaceText());
    }

    // In addition to retrieving all children or descendants of a Node,
    // Nodes expose properties specific to the Node type.
    if ($descendant instanceof Node\Expression\EchoExpression) {
        $echoKeywordStartPosition = $descendant->echoKeyword->getStartPosition();
        // To cut down on memory consumption, positions are represented as a single integer
        // index into the document, but their line and character positions are easily retrieved.
        $lineCharacterPosition = PositionUtilities::getLineCharacterPositionFromPosition(
            $echoKeywordStartPosition,
            $descendant->getFileContents()
        );
        echo "line: $lineCharacterPosition->line, character: $lineCharacterPosition->character";
    }
}

Note: the API is not yet finalized, so please file issues let us know what functionality you want exposed, and we'll see what we can do! Also please file any bugs with unexpected behavior in the parse tree. We're still in our early stages, and any feedback you have is much appreciated :smiley:.

Design Goals

Current Status and Approach

To ensure a sufficient level of correctness at every step of the way, the parser is being developed using the following incremental approach:

Additional notes

A few of the PHP grammatical constructs (namely yield-expression, and template strings) are not yet supported and there are also other miscellaneous bugs. However, because the parser is error-tolerant, these errors are handled gracefully, and the resulting tree is otherwise complete. To get a more holistic sense for where we are, you can run the "validation" test suite (see Contributing Guidelines for more info on running tests). Or simply, take a look at the current validation test results.

Even though we haven't yet begun the performance optimization stage, we have seen promising results so far, and have plenty more room for improvement. See How It Works for details on our current approach, and run the Performance Tests on your own machine to see for yourself.

Learn more

:dart: Design Goals - learn about the design goals of the project (features, performance metrics, and more).

:book: Documentation - learn how to reference the parser from your project, and how to perform operations on the AST to answer questions about your code.

:eyes: Syntax Visualizer Tool - get a more tangible feel for the AST. Get creative - see if you can break it!

:chart_with_upwards_trend: Current Status and Approach - how much of the grammar is supported? Performance? Memory? API stability?

:wrench: How it works - learn about the architecture, design decisions, and tradeoffs.

:sparkling_heart: Contribute! - learn how to get involved, check out some pointers to educational commits that'll help you ramp up on the codebase (even if you've never worked on a parser before), and recommended workflows that make it easier to iterate.


This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.