pylint-dev / pylint

It's not just a linter that annoys you!
https://pylint.readthedocs.io/en/latest/
GNU General Public License v2.0
5.19k stars 1.1k forks source link

Slow with all checks disabled using pandas + dataclass #5835

Open brandon-leapyear opened 2 years ago

brandon-leapyear commented 2 years ago

Bug description

Repro:

  1. mkdir test && cd test
  2. python3 -m venv venv
  3. venv/bin/pip install pylint pandas
  4. Write foo.py:

    from dataclasses import dataclass
    from pandas import DataFrame
    @dataclass
    class Foo:
        a: DataFrame
  5. time pylint --disable=all foo.py

And this consistently takes 8s to run on my machine. Doing any of the following brings the runtime down to <2s:

Configuration

No response

Command used

time pylint --disable=all foo.py

Pylint output

--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

 --disable=all foo.py  7.93s user 0.28s system 111% cpu 7.387 total

Expected behavior

Disabling all checks should not take 8 seconds for this small file.

Pylint version

pylint 2.12.2
astroid 2.9.3
Python 3.9.7 (default, Sep  3 2021, 12:45:31) 
[Clang 12.0.0 (clang-1200.0.32.29)]

OS / Environment

OSX 10.15 (Catalina)

Additional dependencies

astroid==2.9.3 isort==5.10.1 lazy-object-proxy==1.7.1 mccabe==0.6.1 numpy==1.22.2 pandas==1.4.1 platformdirs==2.5.1 pylint==2.12.2 python-dateutil==2.8.2 pytz==2021.3 six==1.16.0 toml==0.10.2 typing_extensions==4.1.1 wrapt==1.13.3

brandon-leapyear commented 2 years ago

:sparkles: This is an old work account. Please reference @brandonchinn178 for all future communication :sparkles:


~Update: the minimal repro might be fast with the version of pylint on master, so this might not be an issue anymore.~ Never mind, forgot to install pandas. When pandas is installed, the minimal repro is still slow using the version of pylint on main.

Related, will there be a pylint release anytime soon?

Pierre-Sassoulas commented 2 years ago

Hi @brandon-leapyear thank you for opening the issue. The next milestone for pylint is https://github.com/PyCQA/pylint/milestone/49, it's 89% done right now, we need a release of astroid in order to close it, it's here https://github.com/PyCQA/astroid/milestone/25, 70% done right now.

clavedeluna commented 1 year ago

I'm able to reproduce this, though not at 8s

time pylint --disable=all foo.py

real    0m5.106s
user    0m5.353s
sys 0m0.341s

5s is still quite shocking so I agree this issue is worth having. Though I wonder how to even tackle an issue like this?

For curiosity, I ran time pylint test.py (so not disabling, just enabling whatever checks are enabled via config) and I"m getting something pretty similar

time pylint test.py

real    0m5.377s
user    0m5.536s
sys 0m0.388s

Then I remove the pandas usage in the file and the time comes down significantly on both enabled checks and disabling all checks

real    0m0.886s
user    0m0.653s
sys 0m0.113s

So to me this is an issue related to pylint and pandas not related to disabling checks.

Pierre-Sassoulas commented 1 year ago

Though I wonder how to even tackle an issue like this?

There's a documentation about performance for contributor here : https://pylint.pycqa.org/en/latest/development_guide/contributor_guide/profiling.html

clavedeluna commented 1 year ago

Cool. An investigation with a profiler is definitely needed here to get some data on what's going on!

nickdrozd commented 1 year ago

@clavedeluna There's another profiler called Yappi that I've used to great effect in Pylint profiling. I wrote up some instructions for it here: https://nickdrozd.github.io/2022/04/12/performance-hot-spots.html

Pierre-Sassoulas commented 1 year ago

I also came across of https://github.com/bloomberg/pytest-memray recently and wanted to check what it can do for pylint. I did not try anything yet.