pylint-dev / pylint

It's not just a linter that annoys you!
https://pylint.readthedocs.io/en/latest/
GNU General Public License v2.0
5.31k stars 1.14k forks source link

E0401 (import-error) checks perform repeated file reads #9603

Open correctmost opened 6 months ago

correctmost commented 6 months ago

Bug description

This is a follow-up to #9310, where I reported slowness with import-error checks due to repetitive I/O over SSHFS.

While profiling the new code, I noticed that the _is_setuptools_namespace checks in astroid cause the same files to be read over and over.

My public example repo shows the following reads:

I'm hoping that the repeated reads can be prevented to speed up pylint. (My private repo has ~2,200 files and shows >20,000 repeated reads.)

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=E0401

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone --branch import-error-stats https://github.com/correctmost/pylint-corpus.git
cd pylint-corpus

python ./profile_pylint.py

Analysis

strace shows the same files being opened repeatedly:

$ strace -e trace=openat python ./profile_pylint.py 2>&1 | sort | uniq -c | sort -nr

109 openat(AT_FDCWD, "pylint-corpus/src/__init__.py", O_RDONLY|O_CLOEXEC) = 3
 50 openat(AT_FDCWD, "pylint-corpus/src/resources/sites/pages/page.py/__init__.py", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (Not a directory)
 50 openat(AT_FDCWD, "pylint-corpus/src/resources/results/result.py/__init__.py", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (Not a directory)

It seems possible to avoid most of these reads with caching around _is_setuptools_namespace, but I wonder if _is_setuptools_namespace should even be called with a non-directory path (notice the ENOTDIR errors)?


Python profiling:

import pstats

stats = pstats.Stats('stats')
stats.print_callers('_io.open')

ncalls  tottime  cumtime
   206    0.017    0.023  astroid/interpreter/_import/spec.py:329(_is_setuptools_namespace)

Pylint output

There is no output, just reduced performance

Expected behavior

Improved performance via caching or reduced filesystem accesses

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@a4a9fcc44ae0d71773dc3bff6baa78fc571ecb7d
pylint @ git+https://github.com/pylint-dev/pylint.git@500774ae5a4e49e2aa0c8d3f2b64613e21aa676e
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response

Pierre-Sassoulas commented 6 months ago

Love those issues, keep them coming :heart: !