rectorphp / rector

Instant Upgrades and Automated Refactoring of any PHP 5.3+ code
https://getrector.com
MIT License
8.62k stars 680 forks source link

Rector caching and large code bases #7806

Closed keulinho closed 1 year ago

keulinho commented 1 year ago

Question

We have a codebase with >7.000 files and want to run rector continuously in the CI to enforce a common code style, catch errors etc.

What should an optimal setup for rector look like? We are currently running in a lot of issues mostly regarding performance, memory usage and caching.

For example it is not possible to run rector on the whole code base at once on a local machine with 32GB RAM, as it will run out of memory during the run.

Our first idea was to run rector on separate sub folders and only run it on 1.000-1.500 classes at once, then at least the commands succeed and are not running out of memory. But with that approach we got problems as the order of the runs seems to change the results, e.g. if i execute rector first on the src/Framework folder and then on the tests folder rector does not find any issues, but if clear the cache and change the order to let it run first on the tests folder it suddenly finds 2 issues. So it seems like the caching is not relyably working, but on the other hand dissable caching completely is also not a good solution as a full run on >7.000 files takes a long time to complete on a local machine.

Has anyone experienced something similiar? Is there anything that we are missing? How do you setup rector in bigger projects?

Our current rule config looks like this:

    $rectorConfig->sets([
        LevelSetList::UP_TO_PHP_81,
        SymfonyLevelSetList::UP_TO_SYMFONY_62,
        TwigLevelSetList::UP_TO_TWIG_240,
        PHPUnitLevelSetList::UP_TO_PHPUNIT_90,
    ]);

Thanks :blue_heart:

Thanks for creating rector it already helped us a ton in upgrading to new library/php versions :v:

samsonasik commented 1 year ago

That may be a bug somewhere, you can try per-setlist approach, eg:

$rectorConfig->sets([ \Rector\Set\ValueObject\SetList::PHP_52, ]);

Then repeat up until it got error/issue or part that slow so we can get the source of the error and it fixable.

You can read how to narrow down rules at https://tomasvotruba.com/blog/2021/02/01/effective-debug-tricks-narrow-scoping/

staabm commented 1 year ago

In phpstan we suggest reducing the number of maximum parallel processes and/or enable bleeding edge. This saves a lot of memory and/or runtime.

Since rector is using phpstan under the hood, I guess this could also be worth a try.

keulinho commented 1 year ago

Thanks for the ideas and the feedback, @staabm how can i enable bleeding edge for rector? :thinking:

staabm commented 1 year ago

I think you can define the phpstan config which rector uses with

https://github.com/rectorphp/rector-src/blob/6091bdd821a72aa9df52363d944b003ec2fbaf1d/packages/Config/RectorConfig.php#L95

In that file you can enable bleedingEdge like you are used to it in phpstan

keulinho commented 1 year ago

I dug a little deeper and sadly the phpstan tricks did not yelp here and also limiting the active rule set had no real effect.

With just one rule active (and i tested multiple different rules to make sure that the issue does not come from the used rule) i also ran into the memory issue.

So for me it seems that the memory usage is proportional to the number of analyzed files, the number of active rules or number of parallel processes does not seem to make a big difference.

This is the minimal rector config, with a project with ~7.500 files to be analyzed, this leads to >20GB memory being used:

return static function (RectorConfig $rectorConfig): void {
    $rectorConfig->paths([
        __DIR__ . '/src',
        __DIR__ . '/tests',
    ]);

    $rectorConfig->skip([
        __DIR__ . '/*/vendor/*',
        __DIR__ . '/*/node_modules/*',
        __DIR__ . '/*/Resources/*',
    ]);

    $rectorConfig->rule(FirstClassCallableRector::class);
};

I tried to play with the parallel configuration, but even disable parallel did not really improve the situation even then the memory filled up just slower.

I also tried using InMemoryCache and FilesystemCache, but again that does not make a difference.

Also the skip part is important for us, without it rector already needs a lot of memory to figure out on which files acutally to run (probalby based on files extension, and node_modules folder contain alot of .js files)

With this it seems to me that rector currently is not really usable for bigger projects. At least i can't get it to work in a manner that works out of the box on every local dev machine :thinking:

Happy for real life insights or some more tricks that we can try.

samsonasik commented 1 year ago

Skip currently doesnot support absolute path + pattern, if you have pattern to skip, you can do:

'**/app/*'

Ref https://github.com/rectorphp/rector/issues/7780

keulinho commented 1 year ago

Thanks for the info, i also tried that but the change had no effect in terms of performance or memory consumption

keulinho commented 1 year ago

I created a blackfire trace for a run on a subset (it ran on 248 files) without parallel processing (as the parallel processes can't be profiled by rector)

Even there the high memory consumption and memory leak the longer rector runs seems obvious. At the end of the run rector consumed 2.6GB of memory for analysing 248 files.

See the trace here: https://blackfire.io/profiles/c7c3b1b3-66bc-4a75-821d-c5176c2f2536/graph

staabm commented 1 year ago

Maybe you could check whether there are certain files which take considerable more time then others?

In phpstan we use something like https://gist.github.com/ruudk/41897eb59ff497b271fc9fa3c7d5fb27 to detect worst case files.

These could be used to create smaller repros and therefore ease developing speed improvements


Is the codebase in question open source and if so, could you provide repro steps?


could you post a new blackfire profile with the a recent release?

TomasVotruba commented 1 year ago

32 GB for 7000 files seems too much. There is probably a glitch with a file + rule combo. Possibly infinity loop. Reproducible use-case would help us to find it and solve it :)

On the other hand, we've just released Rector 0.16 a with many performance improvements by @staabm :+1: https://github.com/rectorphp/rector-src/releases/tag/0.16.0

Give it a try and let us know :wink: