Open patrickkusebauch opened 1 year ago
Slack thread link for reference: https://symfony-devs.slack.com/archives/C036EU1GZS9/p1681985816572369
I have been working with NikicPhpParser for my project similar to deptrac before I discovered deptrac :) I solved similar problem this way:
nproc
array_chunk
found files by estimated number of chunk based on cpus availableThe results were stunning.
but in deptrac you have this problem only for the first run, then .deptrac.cache
is used.
It looks like you could do something similar for
\Qossmic\Deptrac\Core\Ast\AstLoader::createAstMap
and here
\Qossmic\Deptrac\Core\Ast\Parser\NikicPhpParser\NikicPhpParser::parseFile
\Qossmic\Deptrac\Core\Ast\Parser\Cache\AstFileReferenceFileCache::set
data should be written into a socket or directly into .deptrac.cache
then reloaded.
If there are cpus available the first run would be faster on Linux systems. If child process files, the dependency tree would be broken :( (never happened to me) Might introduce unexpected behavior for custom event observers when they collect data into a singleton object to use later; the singleton will be emtpy so users must be aware of using IPC for such cases.
To me deptrac works just fine. A few minutes pipeline for static tests is acceptable. But maybe someone else would like to work that faster, especially in case of git hooks prepush/precommit integrations, then the time is valuable.
In CI you customarily do not have the cache file and do the whole analysis. Hence the times in the screenshot. I hope to pick the brains of developers of other SA tools like PHPStan and Rector, as they are both Czech (same as me) and easily reachable in some of the Czech PHP Slack communities.
Regardless, thanks for the tips, they might come in handy to anyone looking to implement this.
Also, if you have your project public somewhere, I would love to take a look. There might be some useful gems there to integrate. :slightly_smiling_face:
I also dug a little bit into Deptrac analysis. As @ilnytskyi said, the only thing I found to improve the performance was to parallelize the analysis. I did not implement anything, but the only solution I found was to use pcntl_fork
.
I used this strategy 10 years ago in some projects and the performance was huge: from running out of resources after hours of executing to finish in just some minutes. In my case, I used shared memory with the shmop_*
functions, something like this: https://www.php.net/manual/en/function.pcntl-fork.php#115855
The thing is that pcnt_fork
is only available on Linux, not on Mac nor Windows. And I think that most Docker images do not include the extension unless it is explicitly installed. Even though, I guess it is worth trying it.
On the other hand, I solved my performance problem by not including the vendor
folder, like they explain here: https://github.com/qossmic/deptrac/issues/506#issuecomment-790677433 I think it is a common use case, so maybe it would be useful to add a section in the documentation explaining how to correctly include vendor
dependencies.
Just one minor improvement could be: removing the is_file
here, at least for the parser. As far as I remember we are passing the parser only "valid" files and the false
from the file_get_contents
would be enough when there is something wrong. This would save some I/O.
Another one for the cache would be switching the hash algo. I came across this PR from rector. They switched to the algo xxh128
(since PHP 8.1) and linked following Benchmark. This means when we are switching in this file sha1_file
to hash_file('xxh128', ...)
we would gain some performance improvements.
If you have a workload somewhere which reproduces the slowness I would be happy to help and profile it
oh my.. if I would be interessted in performance, I would not run my static analysis tooling thru docker ;-).
running it locally on my mac pro m1 takes 0.169secs (without docker)
blog-src git:(main) ✗ time vendor/bin/deptrac analyse --config-file=hexagonal-layers.depfile.yaml --report-uncovered --fail-on-uncovered --no-cache
51/51 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%
-------------------- -----
Report
-------------------- -----
Violations 0
Skipped violations 0
Uncovered 0
Allowed 222
Warnings 0
Errors 0
-------------------- -----
vendor/bin/deptrac analyse --config-file=hexagonal-layers.depfile.yaml 0.13s user 0.04s system 97% cpu 0.169 total
I wonder why the CI pipeline scans so much files but locally only 51 are scanned? I guess my command is wrong?
I would not say people care about performance as much as it does not compute to me why we take longer than PHPStan and Psalm combined. It was me who made it an issue. deptrac should not be the slowest link in the CI pipeline, that is just not a good user experience.
Yeah, I was joking. If I can reproduce it locally - without docker - I can look into it
@staabm it's also me, with my setup it also only parses 51 files(in docker) but more than 6000 through the github action. Maybe it's not a deptrac bottleneck but a github action config problem?
A small improvement for the pipeline would be to use the flag --no-cache
, this way there will no writes made by deptrac.
Sorry, I did not see the comments with the analysis. I improved Deptrac's configuration to only scan src
files. You can see the performance degradation mentioned on the first comment if you include vendor
in the paths to analyze by Deptrac.
I did some profiling my self, but I'm not that experienced in it. But what i found out is, that one big performance penalty, beside the AST parsing, is builing the symfony service container. I'm curiois to see your results @staabm .
Sorry, I did not see the comments with the analysis. I improved Deptrac's configuration to only scan
src
files. You can see the performance degradation mentioned on the first comment if you includevendor
in the paths to analyze by Deptrac.
Okay, I see. I think we should work on the performance of deptrac in the future. If I got it right the screenshot @patrickkusebauch shared was based on the deptrac-config which parsed the whole vendor dir, right? But the other tools just parsed the src?
Sorry, I did not see the comments with the analysis. I improved Deptrac's configuration to only scan
src
files. You can see the performance degradation mentioned on the first comment if you includevendor
in the paths to analyze by Deptrac.Okay, I see. I think we should work on the performance of deptrac in the future. If I got it right the screenshot @patrickkusebauch shared was based on the deptrac-config which parsed the whole vendor dir, right? But the other tools just parsed the src?
Exactly, the first screenshot was taken when Deptrac scanned the vendor
folder. And yes, the other tools only scan the src
folder. It is as I said in a previous comment:
On the other hand, I solved my performance problem by not including the
vendor
folder, like they explain here: #506 (comment) I think it is a common use case, so maybe it would be useful to add a section in the documentation explaining how to correctly includevendor
dependencies.
During a Slack conversation with Rubén Rubio, we found out that Deptrac was the slowest quality gate in his CI pipeline. Even slower than running both PHPStan and Psalm together. This should not be the case. We should be faster than either of these tools. We do significantly less work. All 3 of us are parsing AST (should take about the same), but then we only resolve dependencies, while Psalm and PHPStan do a lot more work resolving types and applying rules on those.
Some numbers that came from a debug build of deptrac with perf counters:
Obviously, the most expensive part is the AST parsing. We should look at how it is written, if it can be parallelized and if we can take some of NikicPHPParser's performance suggestions into consideration.