pylint-dev / pylint

It's not just a linter that annoys you!
https://pylint.readthedocs.io/en/latest/
GNU General Public License v2.0
5.24k stars 1.12k forks source link

Performance Benchmarks / Integration tests #1954

Open brycepg opened 6 years ago

brycepg commented 6 years ago

Pylint needs a collection of performance integration tests (see PyCQA/astroid#519) help further optimization efforts.

They don't need to be run as part of the standard CI or with just calling pytest since they will be slow, but we should have them to make sure optimizations efforts aren't over optimizing specific projects and so we know which pull requests will slow down astroid/pylint.

I'm thinking that we should pick a specific tag of a specific project, and then set up pytest-benchmark to setup by downloading/extracting the correct zip and running pylint against those files on a developer's machine, and timing how long it takes for pylint to execute and if pylint raises any internal errors.

These tests additionally might be used to look for consistency in output and changes in pylint output messages.

brycepg commented 6 years ago

I did a google bigquery for github repos that have a pylintrc file. I'll probably use these as a starting point for integration tests. Here's the top 25:

_stars_stars _stars_repo_name
11301 balloob/home-assistant
7059 google/yapf
6694 dbcli/pgcli
6380 wifiphisher/wifiphisher
3635 NervanaSystems/neon
3499 google/seq2seq
3225 numenta/nupic
2796 lektor/lektor
2549 SirVer/ultisnips
2179 edx/edx-platform
2073 JonnyWong16/plexpy
1982 UnivaqTelegramBot/UnivaqInformaticaBot
1982 giacomocerquone/UnivaqBot
1942 ray-project/ray
1737 boramalper/magnetico
1561 rembo10/headphones
1490 lorien/grab
1452 marcgibbons/django-rest-swagger
1356 tdryer/hangups
1267 EntilZha/PyFunctional
1267 EntilZha/ScalaFunctional
1106 PyCQA/pylint
PCManticore commented 6 years ago

Thanks for tackling this, Bryce! Would it be possible to have a script that reproduces the benchmark dataset? This way us and contributors would have a de-facto way for retrieving the dataset, running the integration tests and getting the performance results out of the dataset.

brycepg commented 6 years ago

Yeah, I'll be adding some docs to https://github.com/brycepg/pylint-corpus

I've decided to just download source of each python repository into the corpus repository, so I don't have to worry about performance changing over new versions of the same repository.

brycepg commented 6 years ago

Some other notes: I tried to make pylint work on some really big repositories like pandas and numpy but pylint never completed, even over night

PCManticore commented 6 years ago

Wow that is crazy! I wonder what graphs we could extract from pandas and numpy, maybe there are other pathological cases that we might need to look into.

doublethefish commented 4 years ago

The following is edited content taken from PR #3473 .

There are three key parts to achieving this properly:

  1. Generate benchmark data that can be used for comparisons against benchmark data generated under the same conditions for a different commit/branch
  2. Upload data the data from (1) somewhere useful
    • This requires some extra config+work in PyCQA's systems.
    • For fastest implementation, we can use an Amazon S3 bucket.
  3. The final and possibly the most important bit, is that we attempt to establish a baseline of basic performance.
    • This requires some thought for pylint. The rest of this comment makes some suggestions.

Comparable Benchmarks

In PR #3473 we use pytest-benchmark to benchmark the code and we suggest a use-case for how to use it.

The creates .json files that can be compared to .json files created from other runs. Which is very useful. However, for those of us familiar with performance regression testing, without ensuing that the two runs being compared were run under highly-similar conditions we are very likely to get false-postive/negative results. So, some thought needs to be given to how to make use of the .json files generated by pytest-benchmark. Ideas on a post-card.

Side-by-side comparisons

For example, to ensure a fair comparison, we might want to benchmark the target merge branch and the wip-branch within the same tox-run so we know we got comparable profiling information from two runs on the exact-same machine/instance/load. There may be a better way to do this.

Prev runs via Cloud storage

Whilst side-by-side comparisons would likely be the most reliable solution, it is likely to be useful to be able to record a history of performance so we can spot trends, share perfromance heatmaps, logs and so on.

One very quick solution here is to use S3 because travis supports AWS S3 storage out-of-the-box. I already have code to upload benchmark data and heatmaps to an S3 instance.

This means that PyCQA would have to do the same to support this PR. But there would be considerations around costs, monitoring and so on.

Baselines (validation of thinking needed)

To make this useful for any performance work we need to establish robust benchmark baselines. There's one attached to #3473 but we will need more for each of the expensive parts of the system.

To help explain why we need to think about this carefully, whilst creating PR #3458 (which really enables full support of multiprocessing) and dependent work, discovering some hidden problems along the way and reading the various related issues around it (e.g. #1954), it became clear that the improving performance in pylint is not obvious to someone not intimately familiar with pylint, checkers, pylint-configuration and pylint's dependencies (astroid).

brandon-leapyear commented 2 years ago

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


Some other notes: I tried to make pylint work on some really big repositories like pandas and numpy but pylint never completed, even over night

This also affects repos importing pandas + numpy. #5835 is one minimal example of this, but more generally, running pylint --disable=all on all the files in our repo takes 10s, while it takes 3s after commenting out all pandas + numpy + tqdm imports.

With prospector, it's currently ~40s, after commenting out all pandas + numpy + tqdm imports it's down to 30s, commenting out scipy + matplotlib + seaborn also brings it down to 15s.