nikic / PHP-Parser

A PHP parser written in PHP
BSD 3-Clause "New" or "Revised" License
16.99k stars 1.09k forks source link

Parse any project with some file. #340

Closed Rivendall closed 7 years ago

Rivendall commented 7 years ago

Hello. I think, this is a good library an useful. I have a problem. When i use single file, out put is correct, but use one project include some file with commands for example requier and include, php parser create AST for any file separately and then create cfg for any file separately. Can php parser , parse project and recognize include or require?

Thank you.

nikic commented 7 years ago

PHP-Parser provides the AST per file -- if you want to resolve includes, you should do so manually, for example by creating a visitor that replaces any include node by the AST of that file. Of course, includes are not usually performed on simple strings, so you will have to resolve at least basic constant expressions based on __DIR__ and will not be able to resolve some dynamic includes at all. You also need to be careful about directly or indirectly recursive includes.

Rivendall commented 7 years ago

Thank You for your useful answer. I have an idea for resolve this problem, but I think, it works on small project, and tested in small project. In my idea, we have to a pre parse after send code to php parser, and process code. In this process, we should replaces any include in code, manually, then new files send to php parser for create AST. Of course, this is your idea in the AST, but I say, replaces after send to parser. but there are problems about recursive includes. What is your idea? What do you think about thats? Is it possible?

Thank you.

muglug commented 7 years ago

@Rivendall include statements can use any expression that PHP can evaluate for the file, so your solution would need (for all but the simplest case) to have an understanding of magic constants like __FILE__, commonly-used functions like dirname etc. This is outside the scope of PhpParser, but a number of static analysis tools are able to infer those paths with a high hit rate.

Rivendall commented 7 years ago

@muglug , Thank you for answer. Your answer is true. I want to use php parser in the simple static code analyzer for php. After a lot of research, i choose php parser for create AST and parse code after analysis. But i don't know is there any library or tools for recognize include, require and etc in code? or is it possible? or is it needs for static analyzer? If you have any help or ideas, please share with me. I wrote some algorithms for replace Include ... in code, but there are any problems for develop. I want to sure, there aren't any way for this work.

ghost commented 7 years ago

@nikic Which node (class of node/node type) can be used as replacement for the require? If NodeVisitor returns an array the NodeTraverser shows the error

'leaveNode() may only return an array if the parent structure is an array'

I try to find a valid type and tried to replace the require expression with the InlineHTML node but it does not handle the all cases. The included files can contain PHP code, HTML code, combination of PHP and HTML, be empty. I.e. I am asking about a way of replacing a node with an array of nodes.

nikic commented 7 years ago

@markbook2 Just to make sure, are you using PHP-Parser 3.0 or the master branch?

If you're on 3.0 it should be possible to replace a plain require "foo.php"; simply by returning an array. However, you may run into an issue for code like $foo = require "foo.php"; or if (require "foo.php"). There is no universal way you can replace this. After all, you can't simply dump the code of the file into the assignment or if condition. You would have to perform a more sophisticated merge.

Can you provide some more context on what you're trying to do? Usually explicitly merging the AST is not necessary, and in case you want to handle not only includes but also function-calls non-opaquely it is not really feasible. There are usually better ways to make analysis passes inter-procedural and context-sensitive, but that depends on what specifically you want to do.

ghost commented 7 years ago

@nikic

Just to make sure, are you using PHP-Parser 3.0 or the master branch?

We use HEAD of the master branch.

Can you provide some more context on what you're trying to do?

We work on template engine which is a transpiler (source-to-source compiler), it takes template language (let's call it L1) and translates it into PHP. The L1 has the same syntax as PHP but some language constructs have different semantics (for e.g. the require, see below).

In the L1 the require language construct can't be used as expression and works as follows:

1) Evaluate the expression e in the require e using PHP rules and get as result a string with the file path

2) Read the file using the path from 1) using file_get_contents() and process the all require directives in it recursively.

We try to achieve this through modifying the AST with the following code:

// Compiler
class Compiler {
    public function compile($phpTemplateCode, bool $print = true) {
        $parser = new Parser(new Lexer());
        $nodes = $parser->parse($phpTemplateCode);
        $traverser = new NodeTraverser();
        $traverser->addVisitor(
            new Processor($this)
        );
        $nodes = $traverser->traverse($nodes);
        if (!$print) {
            return $nodes;
        }
        $prettyPrinter = new PrettyPrinter();
        return $prettyPrinter->prettyPrintFile($nodes);
    }
}

// Processor used in Compiler:
class Processor extends NodeVisitorAbstract {
    public function __construct($compiler) {
        $this->compiler = $compiler;
    }

    public function leaveNode(Node $node) {
        if ($node instanceof IncludeExpr) {
            // For simplicity assume that the $node->type == TYPE_REQUIRE.

            $nodes = $this->evalRequire($node->expr);

            // What to do with nodes?
        }
    }

    protected function evalRequire(Expr $expr) {
        $filePath = $this->evalExpr($expr);
        $code = file_get_contents($filePath);
        return $this->compiler->compile($code, false);
    }

    protected function evalExpr(Expr $expr) {
        $printer = new PrettyPrinter();
        return eval('return ' . $printer->prettyPrintExpr($expr) . ';');
    }
}

Here is an example of L1:

// File require-test.phtml
<h1><?php require __DIR__ . '/included-1.phtml'; ?>!</h1>

// File included-1.phtml
Hello <?php require __DIR__ . '/sub-dir/included-2.phtml'; ?> works

// File sub-dir/included-2.phtml
<?php echo 'World' ?>

and an example of the usage:

echo (new Compiler())->compile(file_get_contents("require-test.phtml"));

Expected result is:

<h1>Hello World works!</h1>

As the $nodes in

$nodes = $this->evalRequire($node->expr);

is an array we can't return it as result, because the mentioned above error

leaveNode() may only return an array if the parent structure is an array

will be displayed.

nikic commented 7 years ago

If you're on the master branch, the problem you're likely running into is that Expr\Include is wrapped in a Stmt\Expression. You need to replace not only the include expression itself, but the entire statement. The code would be something like this:

    public function leaveNode(Node $node) {
        if ($node instanceof Stmt\Expression) {
            $expr = $node->expr;
            if ($expr instanceof Expr\Include_) {
                // For simplicity assume that the $expr->type == TYPE_REQUIRE.
                $nodes = $this->evalRequire($expr->expr);
                return $nodes;
            }
        }
    }
ghost commented 7 years ago

@nikic it works, thanks!

nikic commented 7 years ago

Okay, in that case I'm closing this issue.