sirbrillig / phpcs-changed

🐘 Run phpcs on files and only report new warnings/errors compared to the previous version.
MIT License
31 stars 11 forks source link

Add caching of phpcs results #44

Closed sirbrillig closed 3 years ago

sirbrillig commented 3 years ago

Running phpcs is probably the largest overhead of using phpcs-changed, since it needs to be run twice for each file. In addition, just getting the contents of some of those files can be expensive, notably old versions of svn files. In this PR we cache each phpcs result for later use. Each cache entry is removed if the svn revision changes (for svn mode), or if the file hash of any file changes. We keep caches for different phpcs standards so that phpcs-changed can be run on the same file for different standards multiple times and still benefit from the cache.

This PR adds caching to svn mode and git mode. To activate the cache, you must use the --cache CLI option, and it can be explicitly disabled using --no-cache. The cache can be completely cleared by using the --clear-cache option.

sirbrillig commented 3 years ago

This seems to be working well. Here's an example of running the arcanist linter which is running phpcs-changed on 33 files with svn mode four times each. It brings the run time down from 8 minutes to 1 minute on the second run.

sirbrillig:~/public_html$ time arc lint
 OKAY  No lint messages.

real    8m18.864s
user    10m15.388s
sys     4m28.348s
sirbrillig:~/public_html$ time arc lint
 OKAY  No lint messages.

real    1m3.540s
user    0m13.936s
sys     0m53.044s

I think it might be wise to make this behavior opt-in to begin with because the potential for caching problems is a pretty large risk.

sirbrillig commented 3 years ago
sirbrillig commented 3 years ago

I think we want to delete all new file cache entries if the file hash does not match. This is not really easy right now since we are joining the standard and the hash to form the cache key. 🤔

sirbrillig commented 3 years ago

There's some complexity WRT git mode and caching that I need to address since the two versions of the file might exist at two relatively arbitrary nodes. I've put some TODOs in the code where this needs to be examined.

sirbrillig commented 3 years ago

It would probably be a good idea to add tests to make sure the cache is cleared in git mode if the file hash changes.