phpro / grumphp

A PHP code-quality tool
MIT License
4.14k stars 430 forks source link

Huge delay running task with nikic/PHP-Parser instance in parallel #815

Closed InvisibleSmiley closed 1 year ago

InvisibleSmiley commented 4 years ago
Q A
Version 0.22.0
Bug? yes
OS Windows 10

Note: I'm not sure whether this is a bug with nikic/PHP-Parser or GrumPHP but it's hard to tell for me since I don't know how to debug parallel mode.

A task that keeps an instance of nikic/PHP-Parser (without actively using it!) takes much longer to complete in parallel mode. Git commit

Steps to reproduce:

# create/fill grumphp.yml
# create/fill composer.json
mkdir src
# create/fill src/ParserStub.php
composer install
git init
git commit

grumphp.yml:

grumphp:
    tasks:
      parserstub:
    parallel:
      enabled: true

services:
    task.ParserStub:
        class: Foo\GrumPHP\Task\ParserStub
        arguments:
            - '@process_builder'
            - '@formatter.raw_process'
        tags:
            - {name: grumphp.task, task: parserstub}

composer.json:

{
    "name": "foo/parserstub",
    "require-dev": {
        "nikic/php-parser": "^4.9",
        "phpro/grumphp": "^0.22.0"
    },
    "autoload-dev": {
        "psr-4": {
            "Foo\\GrumPHP\\Task\\": "src/"
        }
    }
}

src/ParserStub.php:

<?php
declare(strict_types=1);

namespace Foo\GrumPHP\Task;

use GrumPHP\Formatter\ProcessFormatterInterface;
use GrumPHP\Process\ProcessBuilder;
use GrumPHP\Runner\TaskResult;
use GrumPHP\Runner\TaskResultInterface;
use GrumPHP\Task\AbstractExternalTask;
use GrumPHP\Task\Context\ContextInterface;
use GrumPHP\Task\Context\GitPreCommitContext;
use PhpParser\Lexer;
use PhpParser\Parser\Php7;
use Symfony\Component\OptionsResolver\OptionsResolver;

final class ParserStub extends AbstractExternalTask
{
    private $parser;

    public function __construct(
        ProcessBuilder $processBuilder,
        ProcessFormatterInterface $formatter
    ) {
        parent::__construct($processBuilder, $formatter);
        $this->parser = new Php7(new Lexer());
    }

    public static function getConfigurableOptions(): OptionsResolver
    {
        return new OptionsResolver();
    }

    public function canRunInContext(ContextInterface $context): bool
    {
        return $context instanceof GitPreCommitContext;
    }

    public function run(ContextInterface $context): TaskResultInterface
    {
        return TaskResult::createPassed($this, $context);
    }
}
veewee commented 4 years ago

Not sure what to do with this issue @InvisibleSmiley. What happens if you remove the parser initialization? Does it run as fast as expected? If that's the case, it is probably related to nikic/php-parser.

Do know that the parallel implementation does not mean it is faster on 1 task. The parallel system triggers new php processes that run code inside GrumPHP. Booting these processes and initializing the code takes some time too. However, if you have e.g. 10 tasks, the parallel system will finish earlier than running all tools in serial mode because it does not need to wait on the previous tool.

InvisibleSmiley commented 4 years ago

What happens if you remove the parser initialization? Does it run as fast as expected?

Slower than non-parallel mode (~2sec compared to ~1sec) but I guess that's OK.

If that's the case, it is probably related to nikic/php-parser.

As I said, it's hard for me to tell since I cannot debug the task in parallel mode, but since changing GrumPHP settings makes a difference I wanted you guys to check first before I open an issue with nikic/php-parser so the guys there don't just direct me here. I'd really appreciate it if you could somehow definitely rule out a GrumPHP issue here.

InvisibleSmiley commented 4 years ago

Also note that if you replace $this->parser by $parser, i.e. only assign to a local variable rather than an instance variable, the delay does not occur. So the problem seems to occur somewhere after the creation of the object, which in my mind makes it more likely to be a problem with either GrumPHP itself or amphp/parallel.

veewee commented 4 years ago

Ah yes, that might be the explanation! The task configuration is being built in the main process. Since you add the parser initialization to the constructor of the task, amphp serializes the full task (including the initialized phpparser) and passes it to the child process. The child process needs to deserialize it again.

I'dd suggest moving the line from the constructor into a local variable inside the run method. That way, it doesnt get serialized in either the main or child process.

InvisibleSmiley commented 4 years ago

Well if serialize() is indeed used then it must fail internally since the parser object contains a list of Closures which cannot be serialized. There is no error output though. I'd expect the call to fail and output an error message, ideally pointing in the right direction. Maybe it currently tries over and over again until it gives up?

I guess the situation should also be documented somehow so other people know to keep the potential performance impact in mind.

Note: The example I provided is reduced to show the issue. The real task contains another object as instance variable, and that object is created using a DI factory which creates and adds the parser object. So it's not at all obvious that the issue is the task serialisation.

veewee commented 4 years ago

FYI: closures are serialized in amphp through https://github.com/opis/closure We could indeed add something about it in our documentation, but it is mostly documented on the amphp pages : https://amphp.org/parallel-functions/

Having a DI factory inside your class isn't bad as long as you make sure that it can lazily load the parser inside the run method. Also the dependencies of the factory will be serialized, so be carefull what you put in there :)

InvisibleSmiley commented 4 years ago

OK so I'll have to adapt our task implementation, understood.

I consider Amp an implementation detail of GrumPHP, though. From a GrumPHP user's POV there's just tasks and the enabled-by-default option to let GrumPHP run tasks in parallel. So it really should be documented that GrumPHP, when in parallel mode, serializes tasks including all instance variables, recursively, which can lead to performance issues with large properties. A nikic/php-parser object instance may serve as an example. Agreed? 😃

veewee commented 4 years ago

Sure thing! We'll need to add it somewhere.