python / cpython

The Python programming language
https://www.python.org
Other
63.16k stars 30.24k forks source link

Inspect module is prohibitively expensive to import #117865

Open jaraco opened 6 months ago

jaraco commented 6 months ago

From post by @AlexWaygood in https://github.com/python/cpython/issues/117372#issuecomment-2047024740_:

I'm a little dismayed that dataclasses adds so much overhead. Does that imply that dataclasses shouldn't be used in the stdlib?

dataclasses is an unfortunately heavy import, yes. Although it imports relatively few modules itself, one of those modules is inspect, which has a huge import time (inspect imports half the standard library). In the past, I've looked into removing the dataclasses dependency on inspect, but couldn't see a way to do so without making the code significantly more ugly; the same goes when it comes to improving the import time for inspect itself. (In hindsight, I'd argue inspect should probably have been a package rather than a huge single-file Python module... but hindsight is 20/20.)

Perhaps inspect could be refactored to limit the import-time cost.

Currently on my m3 mac pro, on the second attempt, the import times are 1.2ms and 7.9ms (cumulative):

 cpython main @ ./python.exe -X importtime -c 'import inspect' 2> /dev/null ; ./python.exe -X importtime -c 'import inspect' 2>1 | grep -E '(cumul|inspect)'
import time: self [us] | cumulative | imported package
import time:      1184 |       7892 | inspect

I've searched the issue tracker and didn't see anything tracking this concern, so right now it's a naïve feature request.

Related: https://github.com/python/cpython/issues/108901

Linked PRs

jaraco commented 6 months ago

It looks like the bulk of the import time is spent in re, ast, and dis:

import time: self [us] | cumulative | imported package
import time:       223 |        223 |         types
import time:       991 |       1214 |       enum
import time:        47 |         47 |         _sre
import time:       187 |        187 |           re._constants
import time:       270 |        456 |         re._parser
import time:        81 |         81 |         re._casefix
import time:       257 |        839 |       re._compiler
import time:        81 |         81 |           itertools
import time:       100 |        100 |           keyword
import time:        40 |         40 |             _operator
import time:       202 |        241 |           operator
import time:       131 |        131 |           reprlib
import time:        41 |         41 |           _collections
import time:       554 |       1146 |         collections
import time:        32 |         32 |         _functools
import time:       590 |       1766 |       functools
import time:       125 |        125 |       copyreg
import time:       384 |       4326 |     re
import time:      1325 |       1325 |     _ast
import time:       458 |        458 |     contextlib
import time:       861 |       6968 |   ast
import time:       281 |        281 |       _opcode
import time:       160 |        160 |       _opcode_metadata
import time:       220 |        660 |     opcode
import time:       594 |       1253 |   dis
import time:       129 |        129 |   collections.abc
import time:       116 |        116 |     importlib
import time:        43 |        158 |   importlib.machinery
import time:       141 |        141 |     token
import time:        20 |         20 |     _tokenize
import time:       569 |        729 |   tokenize
import time:      1548 |      10783 | inspect

If those imports could be deferred, that would likely address the concern.

hugovk commented 6 months ago

Visualised with https://github.com/nschloe/tuna:

./python.exe -X importtime -c 'import inspect' 2> /dev/null ; ./python.exe -X importtime -c 'import inspect' 2> import.log && tuna import.log
image
JelleZijlstra commented 6 months ago

Some notes:

AlexWaygood commented 6 months ago

yeah, I looked at this in the past and decided that it wasn't realistically possible to improve this situation without either heavily refactoring inspect.py or making the code quite a lot uglier in order to defer imports. But I'd be very happy to be proved wrong if somebody can see a good way of making progress here :-)

JelleZijlstra commented 6 months ago

I think we can get rid of the re import, which would shave off about 1 ms.

hugovk commented 6 months ago
  • I haven't checked if there are other paths by which inspect imports re.

re is also used in tokenize:

AlexWaygood commented 6 months ago

A bunch of the re-module usage in tokenize is backwards-compatibility stuff that isn't actually used by tokenize's core functionality anymore. We could easily deprecate that now, I think. (Previously discussed in https://github.com/python/cpython/issues/104719. Looks like I promised to open a followup issue there about deprecating the obsolete constants but never did so...)

serhiy-storchaka commented 6 months ago

The half of the re import cost is in importing enum and collections. It is difficult to find a program that does not import them. Look not only at the cumulative time, but also at the pure time. The slowest are modules that create a large number of classes, especially with metaclasses.

DavidCEllis commented 6 months ago

Using a tool I have for lazy imports/exports I did some quick testing on deferring the imports for ast, dis, re and tokenize in inspect to see what impact that has on both inspect and dataclasses.

This involved making inspect a package and splitting out the definitions of _ClassFinder and the compiler flags into separate files to be lazily imported if needed. It didn't break ./python -m test -v test_inspect at least (after making __main__.py).


My machine isn't stable enough for really consistent testing but this gives a rough idea of the result.

Benchmarks 4 and 5 include actually creating a dataclass. The dataclass_no_docstring difference is because without a docstring dataclasses calls inspect.signature to generate a basic docstring[^1] and this pulls in the code that uses dis[^2].

Results from today's main ``` Benchmark 1: ./python -c "" Time (mean ± σ): 12.3 ms ± 0.6 ms [User: 10.0 ms, System: 2.5 ms] Range (min … max): 11.6 ms … 14.0 ms 25 runs Benchmark 2: ./python -c "import inspect" Time (mean ± σ): 32.1 ms ± 0.7 ms [User: 29.0 ms, System: 3.3 ms] Range (min … max): 30.6 ms … 33.6 ms 25 runs Benchmark 3: ./python -c "import dataclasses" Time (mean ± σ): 34.4 ms ± 1.5 ms [User: 30.2 ms, System: 4.3 ms] Range (min … max): 32.1 ms … 37.4 ms 25 runs Benchmark 4: ./python scratch/dataclass_with_docstring.py Time (mean ± σ): 34.3 ms ± 1.0 ms [User: 30.6 ms, System: 3.7 ms] Range (min … max): 32.7 ms … 36.5 ms 25 runs Benchmark 5: ./python scratch/dataclass_no_docstring.py Time (mean ± σ): 34.3 ms ± 1.0 ms [User: 30.5 ms, System: 4.0 ms] Range (min … max): 32.8 ms … 37.2 ms 25 runs ```
Results from lazy_inspect ``` Benchmark 1: ./python -c "" Time (mean ± σ): 12.3 ms ± 0.8 ms [User: 10.5 ms, System: 2.2 ms] Range (min … max): 11.7 ms … 15.3 ms 25 runs Benchmark 2: ./python -c "import inspect" Time (mean ± σ): 21.6 ms ± 1.1 ms [User: 20.2 ms, System: 1.7 ms] Range (min … max): 20.6 ms … 24.1 ms 25 runs Benchmark 3: ./python -c "import dataclasses" Time (mean ± σ): 26.0 ms ± 1.1 ms [User: 22.8 ms, System: 3.5 ms] Range (min … max): 24.7 ms … 27.9 ms 25 runs Benchmark 4: ./python scratch/dataclass_with_docstring.py Time (mean ± σ): 26.7 ms ± 1.2 ms [User: 23.2 ms, System: 3.7 ms] Range (min … max): 25.2 ms … 29.3 ms 25 runs Benchmark 5: ./python scratch/dataclass_no_docstring.py Time (mean ± σ): 29.7 ms ± 2.2 ms [User: 26.8 ms, System: 3.1 ms] Range (min … max): 27.4 ms … 36.0 ms 25 runs ```

[^1]: Is this documented anywhere? [^2]: This also means -OO will always pull in dis.