Open neonene opened 4 weeks ago
I cannot confirm this issue on 3.10, so I tried profiling a function:
METH_METHOD|METH_FASTCALL
(as-is) entry
Function Name count
_sre_SRE_Pattern_sub 1000000
cfunction_vectorcall_FASTCALL_KEYWORDS_METHOD 1000000 # 3.10
cfunction_vectorcall_FASTCALL_KEYWORDS_METHOD 1000003 # 3.14
METH_FASTCALL
(no defining_class
) entry
Function Name count
_sre_SRE_Pattern_sub_patched 1000000
cfunction_vectorcall_FASTCALL_KEYWORDS 1001231 # 3.10
cfunction_vectorcall_FASTCALL_KEYWORDS 918 # 3.14
Bug report
Bug description:
There are callables implemented with the
METH_METHOD|METH_FASTCALL
signature in C. They can be 5%-15% less efficient than using onlyMETH_FASTCALL
(orMETH_O
) with aPyType_GetModuleByDef
function call.For example, I measured the difference on Windows PGO builds by duplicating functions:
CDataType_from_buffer_copy()
in_ctypes.c
, which is not called when profiling:dec_mpd_qquantize()
in_decimal.c
profiled with 6800 calls (unfair?):Script (expand)
```py from timeit import timeit setup = """if 1: from _decimal import Decimal d1,d2 = Decimal(1.414), Decimal('0.01') """ for _ in range(2): r0 = timeit(s0 := f'd1.quantize (d2)', setup) r1 = timeit(s1 := f'd1.quantize1(d2)', setup) r2 = timeit(s2 := f'd1.quantize2(d2)', setup) print(s0, r0, 1 + (1 - r0 / r0)) print(s1, r1, 1 + (1 - r1 / r0)) print(s2, r2, 1 + (1 - r2 / r0)) ```Observations:
_sre
), where the impacts may be less significant.CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows