Open correctmost opened 6 months ago
This is a follow-up to #9310, where I reported slowness with import-error checks due to repetitive I/O over SSHFS.
import-error
While profiling the new code, I noticed that the _is_setuptools_namespace checks in astroid cause the same files to be read over and over.
_is_setuptools_namespace
My public example repo shows the following reads:
pylint-corpus/src/__init__.py
pylint-corpus/src/resources/sites/pages/page.py/__init__.py
pylint-corpus/src/resources/results/result.py/__init__.py
I'm hoping that the repeated reads can be prevented to speed up pylint. (My private repo has ~2,200 files and shows >20,000 repeated reads.)
[MAIN] jobs=1 [MESSAGES CONTROL] disable=all enable=E0401 [REPORTS] reports=no score=no
git clone --branch import-error-stats https://github.com/correctmost/pylint-corpus.git cd pylint-corpus python ./profile_pylint.py
strace shows the same files being opened repeatedly:
$ strace -e trace=openat python ./profile_pylint.py 2>&1 | sort | uniq -c | sort -nr 109 openat(AT_FDCWD, "pylint-corpus/src/__init__.py", O_RDONLY|O_CLOEXEC) = 3 50 openat(AT_FDCWD, "pylint-corpus/src/resources/sites/pages/page.py/__init__.py", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (Not a directory) 50 openat(AT_FDCWD, "pylint-corpus/src/resources/results/result.py/__init__.py", O_RDONLY|O_CLOEXEC) = -1 ENOTDIR (Not a directory)
It seems possible to avoid most of these reads with caching around _is_setuptools_namespace, but I wonder if _is_setuptools_namespace should even be called with a non-directory path (notice the ENOTDIR errors)?
ENOTDIR
Python profiling:
import pstats stats = pstats.Stats('stats') stats.print_callers('_io.open') ncalls tottime cumtime 206 0.017 0.023 astroid/interpreter/_import/spec.py:329(_is_setuptools_namespace)
There is no output, just reduced performance
Improved performance via caching or reduced filesystem accesses
astroid @ git+https://github.com/pylint-dev/astroid.git@a4a9fcc44ae0d71773dc3bff6baa78fc571ecb7d pylint @ git+https://github.com/pylint-dev/pylint.git@500774ae5a4e49e2aa0c8d3f2b64613e21aa676e Python 3.12.3
Arch Linux
No response
Love those issues, keep them coming :heart: !
Bug description
This is a follow-up to #9310, where I reported slowness with
import-error
checks due to repetitive I/O over SSHFS.While profiling the new code, I noticed that the
_is_setuptools_namespace
checks in astroid cause the same files to be read over and over.My public example repo shows the following reads:
pylint-corpus/src/__init__.py
pylint-corpus/src/resources/sites/pages/page.py/__init__.py
pylint-corpus/src/resources/results/result.py/__init__.py
I'm hoping that the repeated reads can be prevented to speed up pylint. (My private repo has ~2,200 files and shows >20,000 repeated reads.)
Configuration
Command used
Steps to reproduce
Analysis
strace shows the same files being opened repeatedly:
It seems possible to avoid most of these reads with caching around
_is_setuptools_namespace
, but I wonder if_is_setuptools_namespace
should even be called with a non-directory path (notice theENOTDIR
errors)?Python profiling:
Pylint output
There is no output, just reduced performance
Expected behavior
Improved performance via caching or reduced filesystem accesses
Pylint version
OS / Environment
Arch Linux
Additional dependencies
No response