Closed 23c8a93b-de57-46a3-b68d-6c6a493f0c9f closed 2 years ago
I have another fix.
I have another fix.
If you have another fix, please create a PR ASAN and get it reviewed and merged by a core dev in the next 24 hours, otherwise it will need to wait until 3.10.1
Sadly, I can't reproduce the speedups OP reported from disabling test_patma.TestTracing. It's not any faster than what we have with PR28475. (See attached pyperformance).
I'm looking forward to their other fix :). Even if it comes in 3.10.1 that's still a huge win. I don't think everyone immediately upgrades when a new Python version arrives.
IMO, we should note in What's New that only for Windows, 3.10.0 has a slight slowdown. Some use cases are slower (by >10%!), while some people won't feel a thing. (Then again, maybe this is offset by the LOAD_ATTR opcache in 3.10 and we get a net zero effect?). I'll submit a PR soon if the full fix misses 3.10.0.
IMO, we should note in What's New that only for Windows, 3.10.0 has a slight slowdown.
I disagree. This is a regression/bug and we don't advertise "known bugs" in the what's new, the same for any other bugfix that has been delayed until 3.10.1
Some use cases are slower (by >10%!)
Can you still reproduce this with PR 28475?
I submitted 2 drafts in a hurry. Sorry for short explanations. I'll add more reports.
@pablogsal I'm OK with more effective fixes in 3.10.1 and later.
Thanks all, thanks kj and malin for many help.
I think this is a bug of MSVC2019, not a really regression of CPython. So changing the code of CPython is just a workaround, maybe the right direction is to prompt MSVC to fix the bug, otherwise there will be more trouble when 3.11 is released a year later.
Seeing MSVC's reply, it seems they didn't realize that it was a bug, but suggested to adjust the training samples and use __forceinline
. They don't know __forceinline
hangs the build process since 28d28e0.
_PyEval_EvalFrameDefault() may also need to be divided.
@Pablo
I disagree. This is a regression/bug and we don't advertise "known bugs" in the what's new, the same for any other bugfix that has been delayed until 3.10.1
Alright, in hindsight 3.10 What's New was a bad suggestion on my part. I wonder if there's a better location for such news though.
> Some use cases are slower (by >10%!) Can you still reproduce this with PR 28475?
Yes that number is *with* PR28475. Without that PR it was worse. The second pyperformance comparison in this file is 3.10a7 vs PR28475 https://bugs.python.org/file50293/PR28475_vs_310rc2_vs_310a7.txt. Omitting python_startup (unstable on Windows) and unpack_sequence (microbenchmark):
At this point I don't know if we have any quick fixes left. So maybe we should open another issue for 3.11 and consider factoring out uncommon opcodes into functions like Victor and Mark suggested. We could make use of the opcode stats the faster-cpython folks have collected https://github.com/faster-cpython/tools.
Sadly the MSVC team are claiming that this isn't a bug in their compiler. Not sure how we convince them that it is. The website rejects any attempt to reopen the issue.
How feasible would it be to use Clang or GCC on Windows?
How feasible would it be to use Clang or GCC on Windows?
clang seems to have a good Windows support and tries to the ABI compatible with MSC which is a must have to keep wheel package support (especially for the stable ABI, used by PyQt on Windows for example).
Moreover, there are ways to cross-build Python from another platform to Windows which can be convenient ;-)
I don't know the Windows ecosystem. Do people want to get VS debugger for example? Is clang compatible with the VS debugger?
See the discussion of 2014: "Status of C compilers for Python on Windows" https://mail.python.org/archives/list/python-dev@python.org/thread/SYWDJ23AQDPWQN7HD6M6YCSGXERCHWA2/
I would very much appreciate any new compiler be compatible with the standard Windows debuggers (windbg primarily, but I imagine most contributors would like it to keep working from VS).
Last I heard, clang is fine as a compiler for debugging if you use the MSVC linker to generate debug info, though it still isn't as complete as MSVC (ultimately by definition, since MSVC is the standard-by-implementation for this stuff). And I've got no idea how/whether link-time optimisation works when you mix tools, but I'd have to assume it doesn't.
Switching compiler may prevent me from being able to analyse crash reports (and by me, I mean the automated internal tools that do it for me), and certainly parts of the Windows build rely on MSVC-specific functionality right now (not in the main DLL) so we'd end up needing both for a full build.
Also, just to put it out there, I'm not volunteering to rewrite the build system :) If the steering council signs off on switching, I won't block it, but I have more interesting things to work on.
If we know which parts of the function are critical, perhaps we should be designing a PGO profile that actually hits them all? The current profile is very arbitrary, basically just waiting for someone motivated enough to figure out a better one.
Today I tested with msvc2022-preview, __forceinline
attribute will not hang the build.
64-bit PGO builds:
28d28e0~1,vc2022 : baseline
28d28e0~1+F,vc2022 : 1.02x slower <1>
28d28e0,vc2022 : 1.03x slower <2>
28d28e0+F,vc2022 : 1.03x slower
3.10 final,vc2022 : 1.03x slower
3.10 final+F,vc2022: 1.03x slower
28d28e0~1,vc2019 : 1.00x slower <3>
28d28e0~1 is the last fast commit, 28d28e0 is the first slow commit.
+F
means add __forceinline
attribute to all inline functions in object.h
vc2019 and vc2022 are the latest version.
\<1> Forcing inline is slower. \<2> 28d28e0 is still slow, but not that much. \<3> Normally, msvc2019 and msvc2022 have the same performance.
Is it possible to write a PGO profile for 28d28e0? https://github.com/python/cpython/commit/28d28e053db6b69d91c2dfd579207cd8ccbc39e7
msvc2022 will be released in November this year, and maybe subsequent versions can be built with msvc2022.
PR28475 is not in the official source archive. https://www.python.org/ftp/python/3.10.0/Python-3.10.0.tar.xz
I'll check later whether official binary has the fix.
3.10.0 official binary is as slow as rc2.
Many files are not updated in the source archive or b494f5935c92951e75597bfe1c8b1f3112fec270, so I'm not sure if the delay is intentional or not.
We have no choice except waiting for 3.10.1.
Someone whose name I don't recognize (MagzoB Mall) just changed the issue type to "crash" without explaining why. That user was created today, and has no triage permissions. Mind if I change it back? It feels like vandalism.
According to the suggested stats and pgomgr.exe, I experimentally moved LOAD_FAST and LOAD_CONST cases out of switch as below.
if (opcode == LOAD_FAST) {
...
DISPATCH();
}
if (opcode == LOAD_CONST) {
...
DISPATCH();
}
switch (opcode) {
x64 performance results after patched (msvc2019)
Good inliner ver. 3.10.0+ 1.03x faster than before 28d28e0\~1 1.04x faster 3.8.12 1.03x faster
Bad inliner ver. (too big evalfunc. Has msvc2022 increased the capacity?) 3.10.0/rc2 1.00x faster 3.11a1+ 1.02x faster
It seems to me since quite a while ago the optimizer has stopped at some place after successful inlining. So the performance may be sensitive to code changes and it could be possible to detect where the optimization is aborted.
(Benchmarks: switch-case_unarranged_bench.txt)
The total size of the main interpreter loop was recently reduced somewhat by an unrelated change:
https://github.com/python/cpython/commit/9178f533ff5ea7462a2ca22cfa67afd78dad433b
I wonder if this issue still exists?
I still have the issue in current main and PR29565 with msvc2022 (v142 or v143 toolset).
Hm. If removing 26 opcodes didn't fix this, then maybe the size of _PyEval_EvalFrameDefault isn't really the issue?
I'd like to know how to reproduce this. @neonene can you write down the steps I should do to get the results you get? I have VS 2019, if I need VS 2022 I can install that.
Here are the 3 steps to reproduce with minimal pgo training. (vs2019)
Download the source archive of PR29565 and extract. https://github.com/python/cpython/archive/6a84d61c55f2e543cf5fa84522d8781a795bba33.zip
Apply the following patch.
\==============================
--- PCbuild/build.bat
+++ PCbuild/build.bat
@@ -66 +66 @@
-set pgo_job=-m test --pgo
+set pgo_job=-c"pass"
--- PCbuild/pyproject.props
+++ PCbuild/pyproject.props
@@ -47,2 +47,3 @@
<AdditionalOptions>/utf-8 %(AdditionalOptions)</AdditionalOptions>
+ <AdditionalOptions Condition="$(SupportPGO) and $(Configuration) == 'PGUpdate'">/d2inlinelogfull:_PyEval_EvalFrameDefault %(AdditionalOptions)</AdditionalOptions>
</ClCompile>
==============================
Build [Rebuild]
PCbuild\build --no-tkinter --pgo > build.log [-r]
According to the inlining section in the log, any function that has one or more conditional expressions got "reject" from inliner.
> Inlinee for function _PyEval_EvalFrameDefault > -_Py_EnsureFuncTstateNotNULL (pgo hard reject) > ... > _Py_INCREF (pgu decision) > _Py_INCREF (pgu decision) > -_Py_XDECREF (pgo hard reject) > -_Py_XDECREF (pgo hard reject) > -_Py_DECREF (pgo hard reject) > -_Py_DECREF (pgo hard reject) > ...
Profiling scores can be shown on VS2019 Command Prompt.
pgomgr PCbuild\amd64\python311.pgd /summary [/detail] > largefile.txt
Unused opcodes in this training
ROT_THREE, DUP_TOP_TWO, UNARY_POSITIVE, UNARY_NEGATIVE, BINARY_OP_ADD_FLOAT, UNARY_INVERT, BINARY_OP_MULTIPLY_INT, BINARY_OP_MULTIPLY_FLOAT, GET_LEN, MATCH_MAPPING, MATCH_SEQUENCE, MATCH_KEYS, LOAD_ATTR_SLOT, LOAD_METHOD_CLASS, GET_AITER, GET_ANEXT, BEFORE_ASYNC_WITH, END_ASYNC_FOR, STORE_ATTR_SLOT, STORE_ATTR_WITH_HINT, GET_YIELD_FROM_ITER, PRINT_EXPR, YIELD_FROM, GET_AWAITABLE, LOAD_ASSERTION_ERROR, SETUP_ANNOTATIONS, UNPACK_EX, DELETE_ATTR, DELETE_GLOBAL, ROT_N, COPY, DELETE_DEREF, LOAD_CLASSDEREF, MATCH_CLASS, SET_UPDATE, DO_TRACING
I managed to activate inliner experimentally by removing the 36 op-cases from switch and merging/removing many macros.
Static instruction counts of _PyEval_EvalFrameDefault()
PR29565 : 6882 (down to 4400 with above change)
PR29482 : 7035 PR29482\~1 : 7742 3.10.0+ : 3980 (well inlined sharing DISPATCH macro) 3.10.0 : 5559 3.10b1 : 5680 3.10a7 : 4117 (well inlined)
-set pgo_job=-m test --pgo +set pgo_job=-c"pass"
This essentially disables PGO. You won't get anything valid or useful from analysing its results if you don't give it a somewhat reasonable profile (preferably one that exercises the interpreter loop, which "pass" does not).
@neonene what's the importance of PR29565?
This essentially disables PGO.
Thank you for the suggestion. I'll take another experimental aproach to reduce the size of 3.11 evalfunc for stronger validation.
@neonene what's the importance of PR29565?
While we are talking about function size, I would like to use around PR29565 for consistent reporting. I think any commit is okay to reproduce the issue.
And please ignore the patch to build.bat.
In the eval-loop of PR29565, inlining seems to be enabled within about 70 op-brahches, trained with 44 tests.
log & source: ceval_PR29565_split_func.c (not for performance)
I requested the MSVC team to reconsider the inlining issues, including __forceinline. https://developercommunity.visualstudio.com/t/1595341
The stuck at link due to __forceinline can be avoided by completing the _Py_DECREF optimization outside _PyEval_EvalFrameDefault:
static inline void // no __forceinline
_Py_DECREF_impl(...) {
...
}
static __forceinline void
_Py_DECREF(...) { // no conditional branch in the function
_Py_DECREF_impl(...);
}
In _PyEvalEvalFrameDefault, wrapping the callees like above seems better for performance than just specifying \_forceinline under the current MSVC.
I can't yet confirm a regression in 3.11 (the main branch, currently) compared to 3.10. See my adventures in https://github.com/faster-cpython/ideas/discussions/315.
-_Py_DECREF (pgo hard reject)
What exactly does "pgo hard reject" mean? I Googled it and found no hits besides this very issue.
I am trying to redefine the top three from this error log as macros, but since I still don't have stable benchmark results it's hard to know if this has any effect.
What exactly does "pgo hard reject" mean?
In my recognition, "pgo hard reject" is based on the PGOptimizer's heuristic, "reject" is related to the probe count (hot/cold).
https://developercommunity.visualstudio.com/t/1531987#T-N1535774
And there was a reply from MSVC team, closing the issue. MSVC won't be fixed in the near future.
https://developercommunity.visualstudio.com/t/1595341#T-N1695626
From the reply and my investigation, 3.11 would need the following:
Some callsites such as tp_* pointer should not inline its fastpaths in the eval switch-case. They often conflict. Each pointer needs to be wrapped with a function or maybe _PyEval_EvalFrameDefault needs to be enclosed with "inline_depth(0)" pragma.
__assume(0) should be replaced with other function, inside the eval switch-case or in the inlined paths of callees. This is critical with PGO.
For inlining, use __forceinline / macro / const function pointer.
MSVC's stuck can be avoided in many ways, when force-inlining in the evalloop a ton of Py_DECREF()s, unless tp_dealloc does not create a inlined callsite:
void
_Py_Dealloc(PyObject *op)
{
...
#pragma inline_depth(0) // effects from here, PGO accepts only 0.
(*dealloc)(op); // conflicts when inlined.
}
#pragma inline_depth() // can be reset only outside the func.
Virtual Call Speculation: https://docs.microsoft.com/en-us/cpp/build/profile-guided-optimizations?view=msvc-170#optimizations-performed-by-pgo
The profiler runs under /GENPROFILE:PATH option, but at the big ceval-func, the optimizer merges the profiles into one like /GENPROFILE:NOPATH mode. https://docs.microsoft.com/en-us/cpp/build/reference/genprofile-fastgenprofile-generate-profiling-instrumented-build?view=msvc-170#arguments
__assume(0) (Py_UNREACHABLE): https://devblogs.microsoft.com/cppblog/visual-studio-2017-throughput-improvements-and-advice/#remove-usages-of-__assume
__assume(0) should be replaced with other function, inside the eval switch-case or in the inlined paths of callees. This is critical with PGO.
Out of interest, have you done other experiments confirming this? The reference linked is talking about compiler throughput (i.e. how long it takes to compile), and while it hints that using __assume(0) may interfere with other optimisations, that isn't supported with any detail or analysis in the post.
have you done other experiments confirming this?
My benchmark results are left in https://github.com/faster-cpython/ideas/issues/321#issuecomment-1094129130.
__assume(0) is problematic only where the substitute function is inlined.
Correction of my previous post:
MSVC's stuck can be avoided in many ways, ... unless tp_dealloc does not create a inlined callsite
-unless +if
For Py_UNREACHABLE()
then maybe we should just remove these two lines from, pymacro.h?
#elif defined(_MSC_VER)
# define Py_UNREACHABLE() __assume(0)
Then the code will fall back to
#else
# define Py_UNREACHABLE() \
Py_FatalError("Unreachable C code path reached")
#endif
__assume(0) is problematic only where the substitute function is inlined.
Can you elaborate? What is the "substitute function"? The macro definition is
# define Py_UNREACHABLE() __assume(0)
so there is no inlined function. Are you referring to the code containing the call to Py_UNREACHABLE()
? That wouldn't affect the ceval.c main loop in _PyEval_EvalFrameDefault
because that function is definitely to large to be inlined. :-)
What am I missing?
Sorry for the lack of explanation.
I encountered a measurable slowdown several months ago when Py_RETURN_RICHCOMPARE
macro is inlined in the eval-loop.
However, that may be x86 only.
If I understand correctly, x86 official binaries are non-PGO builds. Then, a Py_FatalError()
only for TARGET(CACHE)
branch would be enough for now.
When I change the current version as below:
Substitute void Py_UNREACHABLE(void) {}
or Py_FatalError()
for __assume(0)
in pymacro.h
Make PyObject_RichCompare()
called through a function pointer, adding
this above _PyEval_EvalFrameDefault()
.
static const richcmpfunc PyObject_RichCompare_PTR = PyObject_RichCompare;
#define PyObject_RichCompare PyObject_RichCompare_PTR
Then, PGO decides to inline PyObject_RichCompare()
, based on its profile.
This seems to affect "Function Layout optimization" even if it is not inlined. (under verification)
PyObject_RichCompare (pgu decision)
_PyThreadState_GET (pgu decision)
_PyRuntimeState_GetThreadState (pgu decision)
_PyErr_Occurred (pgu decision)
-_PyErr_BadInternalCall (pgo hard reject)
_Py_EnterRecursiveCall (pgu decision)
_Py_MakeRecCheck (pgu decision)
-_Py_CheckRecursiveCall (pgo hard reject)
do_richcompare (pgu decision)
PyType_IsSubtype (pgu decision)
-type_is_subtype_base_chain (pgo hard reject)
Py_DECREF (pgu decision)
_Py_Dealloc (pgu decision)
long_richcompare (pgu decision)
...
>>>>> Py_UNREACHABLE (pgu decision) // or _Py_FatalErrorFunc (pgo hard reject)
As for other place (_Py_FatalErrorFunc()
never gets inlined anywhere):
dict_get (pgu decision)
-_PyArg_CheckPositional (pgo hard reject)
dict_get_impl (pgu decision)
unicode_get_hash (pgu decision)
PyObject_Hash (pgu decision)
_Py_HashPointer (pgu decision)
_Py_HashPointerRaw (pgu decision)
-PyType_Ready (pgo hard reject)
-PyObject_HashNotImplemented (pgo hard reject)
_Py_dict_lookup (pgu decision)
-unicodekeys_lookup_unicode (pgo hard reject)
-unicodekeys_lookup_generic (pgo hard reject)
dictkeys_generic_lookup (pgu decision)
dictkeys_get_index (pgu decision)
Py_INCREF (pgu decision)
-PyObject_RichCompareBool (pgo hard reject)
Py_DECREF (pgu decision)
_Py_Dealloc (pgu decision)
>>>>> -Py_UNREACHABLE (pgo hard reject) // (no harm)
Py_DECREF (force inline)
-_Py_Dealloc (initial scan: soft depth exceeded)
-_Py_Specialize_BinaryOp (pgo hard reject)
>>> Py_UNREACHABLE (pgu decision) // @TARGET(CACHE) inlined
_PyFrame_SetStackPointer (pgu decision)
-trace_function_entry (pgo hard reject)
_PyFrame_GetStackPointer (pgu decision)
PyDTrace_FUNCTION_ENTRY_ENABLED (pgu decision)
-dtrace_function_entry (pgo hard reject)
PyDTrace_LINE_ENABLED (pgu decision)
-maybe_dtrace_line (pgo hard reject)
_PyFrame_SetStackPointer (pgu decision)
-maybe_call_line_trace (pgo hard reject)
_PyFrame_GetStackPointer (pgu decision)
-_PyInterpreterFrame_GetLine (pgo hard reject)
-fprintf (initial scan: parameter mismatch, varargs, not eligible)
-_PyErr_SetString (pgo hard reject)
>>> -Py_UNREACHABLE (pgo hard reject) // out of switch (no harm)
Benchmark after removal of TARGET(CACHE)
branch:
Py_UNREACHABLE at long_richcompare() |
x64 PGO | x86 PGO |
---|---|---|
__assume(0) | 1.00 | 1.00 |
Py_FatalError | 1.02x slower | 1.03x~ faster |
void foo(void) {} | 1.02x slower | 1.04x~ faster |
__assume(0)
works well in the hot section on x64.
EDIT: The gap on x86 can be increased depending on the amount of optimization.
Are you referring to the code containing the call to Py_UNREACHABLE()? That wouldn't affect the ceval.c main loop in _PyEval_EvalFrameDefault because that function is definitely to large to be inlined. :-)
Here is MSVC's inlining decision on current Python3.10: https://bugs.python.org/file50291/PR28475_inline.log
Weird, but faster than when only tiny functions are inlined. In the log, Py_DECREF(static) is expanded until Py_Dealloc(extern) stops its recursion. That looks to me too expansive.
It looks like we may be looking at different builds? I'm only looking at x64 builds for 3.11.
If I understand correctly, x86 official binaries are non-PGO builds.
@zooba Is that so?
If I understand correctly, x86 official binaries are non-PGO builds.
Yeah, this is correct. We're more likely to deprecate and drop the 32-bit binaries before we make any major effort to optimise them - they run under an emulation layer in the OS (practically all supported OS installs are 64-bit native), so aren't really going to be recommended for people who care about performance anyway.
I think this issue can be closed. (I can't after migration)
Most of my experiences are invalid after Guido's #91718 corrected the quirks of MSVC. Another reasonable fix would be a good test which makes specialized sections hotter.
Thanks.
Closing as requested by OP. Thanks for your investigations @neonene ! Thanks to Guido too for the fix.
Thank you @neonene for your gentle pushes and encouragement and help to get this fixed!
@neonene:
Most of my experiences are invalid after Guido's https://github.com/python/cpython/pull/91718 corrected the quirks of MSVC.
Do you mean that this merged change https://github.com/python/cpython/commit/2f233fceae9a0c5e66e439bc0169b36547ba47c3 is now useless?
No they are complementary.
Do you mean that this merged change 2f233fc is now useless?
No. What I said is about the optimization, not the (force) inlining. And what I suggested before have been already fixed by f8dc618 (and 2f233fc):
tp_*
or cfunc
pointer in the eval-loop can inline multiple callees without conflict.
Moving LOAD_FAST
out of switch according to the scores below has no advantage now.
TOP3 entries with current 44 tests
case 124 132522464 // LOAD_FAST
case 100 48956231 // LOAD_CONST
case 45 48318813 // LOAD_FAST__LOAD_FAST
What I understand is that PGO build of Python 3.11 on Windows will be faster thanks to these changes, and the Windows python.org binaries only use PGO for 64-bit, not for 32-bit.
You can read a bit more posts and links because you have changed this thread's title several times.
Can someone please try to write a summary of this long and complex issue? It seems like different but related topics have been discussed and it's hard to get an overview. I'm confused between sometimes someone said that a change fixed the fix and then wrote that no, it's not really fixing the issue.
Let me give it a quick try.
Originally, @neonene observed a Windows-specific performance regression in 3.10 between the a7 and b1 release. This was eventually shown to be caused by the function _PyEval_EvalFrameDefault
getting so long that the MSVC LTO gave up on inlining many things there. IIUC in 3.10 this was eventually fixed by making the function a bit smaller (https://github.com/python/cpython/pull/28475).
Of course, the same issue was then observed in the main branch (3.11). We then went back and forth trying various approaches to fix it. This wasn't easy because (a) the code kept changing (because the "Faster CPython" team was very active -- mostly growing the function), and (b) we had no good hardware or strategy to run reliable benchmarks. The latter problem spawned https://github.com/faster-cpython/ideas/issues/321.
Eventually I settled on a fix which consisted of turning a few inline functions back into macros, but only in ceval.c, and not in debug mode (and one only for MSVC). This was PR https://github.com/python/cpython/issues/89279, commit https://github.com/python/cpython/commit/2f233fceae9a0c5e66e439bc0169b36547ba47c3.
Somewhat relatedly, I also figured out how to get MSVC to generate slightly faster switch code: if you switch on a one-byte value and all 256 cases exist, it skips a memory load. This was PR https://github.com/python/cpython/issues/91719, commit https://github.com/python/cpython/commit/f8dc6186d1857a19edd182277a9d78e6d6cc3787.
Finally, I figured out how to get stable benchmark numbers (see https://github.com/faster-cpython/ideas/issues/321#issuecomment-1107072776 and following comments) and showed that the macrofied inline functions gave us 10% performance back and the improved switch code gave 3%.
That's it.
Thanks for the summary. I would add that marking performance critical function with __forceinline
(Py_ALWAYS_INLINE) was tested, but it didn't work.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['interpreter-core', '3.10', 'performance', 'expert-C-API', '3.11', 'OS-windows']
title = 'Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)'
updated_at =
user = 'https://github.com/neonene'
```
bugs.python.org fields:
```python
activity =
actor = 'steve.dower'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core', 'Windows', 'C API']
creation =
creator = 'neonene'
dependencies = []
files = ['50263', '50264', '50271', '50272', '50273', '50274', '50275', '50276', '50280', '50286', '50291', '50293', '50296', '50315', '50363', '50452']
hgrepos = []
issue_num = 45116
keywords = ['patch']
message_count = 82.0
messages = ['401143', '401152', '401154', '401182', '401183', '401319', '401329', '401346', '401364', '401623', '401624', '401628', '401743', '401964', '401970', '401972', '402025', '402040', '402043', '402044', '402063', '402064', '402065', '402067', '402068', '402071', '402090', '402091', '402092', '402098', '402099', '402117', '402135', '402143', '402189', '402190', '402217', '402229', '402230', '402287', '402289', '402296', '402307', '402308', '402320', '402480', '402856', '402857', '402858', '402864', '402867', '402871', '402878', '402886', '402891', '402893', '402928', '402930', '402954', '403403', '403409', '403430', '403432', '403464', '403559', '403587', '403609', '404089', '406354', '406386', '406407', '406416', '406471', '406474', '406479', '406487', '406613', '407188', '415378', '416911', '416950', '416977']
nosy_count = 15.0
nosy_names = ['lemburg', 'gvanrossum', 'rhettinger', 'paul.moore', 'vstinner', 'tim.golden', 'Mark.Shannon', 'zach.ware', 'steve.dower', 'malin', 'pablogsal', 'brandtbucher', 'neonene', 'erlendaasland', 'kj']
pr_nums = ['28390', '28419', '28427', '28475', '28630', '28631', '31436', '31459', '32387']
priority = None
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue45116'
versions = ['Python 3.10', 'Python 3.11']
```