skrah commented 8 months ago

Bug report

Bug description:

The pi benchmark in Modules/_decimal/tests/bench.py is up to 31% slower with --enable-gil and up to 45% slower with --disable-gil compared to Python 3.9 as the baseline.

Change the number of iterations to 100000 for more stable benchmarks.

This is heavily OS and hardware dependent. The worst case cited above was observed on a six core i5-8600K.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

gh-115196
gh-115401

sunmy2019 commented 8 months ago

Can you try --disable-gil with current main?

skrah commented 8 months ago

The figures are almost the same, around 30% (enable) and 43% (disable).

skrah commented 8 months ago

The base slowdown of 30% is of course due to the module state changes. For comparison, and to see that the i5-8600K is not inherently underperforming with threads:

The ideal situation is implemented in libmpdec++: It has a C++11 thread local context, which is also called for every inline operation. The C++11 TLS adds just 4% overhead compared to pure C (where the context is passed directly to the functions).

This of course would be quite difficult to achieve in CPython. But the thread local context may not even be the primary culprit. In Martin von Loewis' first module state implementation the basic slowdown for _decimal was 20%, and a good part of that was due to the heap types.

skrah commented 8 months ago

The ideal situation is implemented in libmpdec++: It has a C++11 thread local context, which is also called for every inline operation. The C++11 TLS adds just 4% overhead compared to pure C (where the context is passed directly to the functions).

This of course would be quite difficult to achieve in CPython.

The wording was not ideal: The key point is that C++ achieves this without a cached context, which is of course now in the module state.

ericsnowcurrently commented 8 months ago

@encukou, any thoughts on this?

neonene commented 8 months ago

~PyModule_GetState()~ Getting a module state via PyType_GetModuleByDef() seems to be expensive for heavy use. It would be nice if a heap type could own the module state's address at its creation time to recover more than 10%.

erlend-aasland commented 8 months ago

PyModule_GetState() via PyType_GetModuleByDef() seems to be expensive for heavy use. It would be nice if a heap type could own the module state's address at its creation time to recover more than 10%.

_PyModule_GetState() is used, not PyModule_GetState(); I'd be surprised if this is the culprit:

https://github.com/python/cpython/blob/1aec0644447e69e981d582449849761b23702ec8/Include/internal/pycore_moduleobject.h#L32-L35

OTOH, PyType_GetModuleByDef() can be more expensive. Storing the state pointer in the type struct is a possibility.

neonene commented 8 months ago

@ericsnowcurrently Can a module state be accessed correctly with the following?

_Py_thread_local module_state *thread_local_state; // set at the exec phase

If not invalid, it will be useful for the function which currently has no way to get the module state. A _Py_thread_local pointer may not be so fast to access, but it can also bypass a state struct?

ericsnowcurrently commented 8 months ago

Multiple interpreters can run in the same thread, so a thread-local may be ambiguous in that case.

As to storing a reference to the module state on the type object, that shouldn't be necessary. Every heap type as a ht_module field that points to the module, accessible via PyType_GetModule(). From there you call PyModule_GetState(). In fact, we already have PyType_GetModuleState() that combines those two calls. That's fairly efficient, though obviously not as efficient as a static global.

The only caveat is that PyType_GetModule() (and thus PyType_GetModuleState()) operates relative to the given type. Thus subclasses make things trickier. IIRC, when they're involved, you end up having to walk the MRO one way or another, which is what PyType_GetModuleByDef() does.

FWIW, the common case where PyType_GetModuleByDef() is used is either in "tp slot" functions or intermediate static functions that aren't doing the right thing. In both cases the problem lies in throwing away the expensive information we already have. For tp slot functions the API doesn't pass the module in, even though it could. We're stuck with backward compatibility issues there. For intermediate static functions, I expect it's all too common to figure out the module (or module state) at some "leaf node" function but not pass it through the intermediate functions. Thus the relatively expensive PyType_GetModuleByDef() gets called more than it needs to be.

I suppose another problem case is with callbacks that don't provide a way to pass the module through. I'm not sure that's a common issue though.

Back to the _decimal module, I haven't looked at what usage patterns are resulting in such a significant slowdown.

erlend-aasland commented 8 months ago

Thus subclasses make things trickier. IIRC, when they're involved, you end up having to walk the MRO one way or another, which is what PyType_GetModuleByDef() does.

Yes; heap types that are not base classes can go the fast route via PyType_GetModuleState() (and _PyType_GetModuleState()).

encukou commented 8 months ago

On February 2, 2024 11:10:49 PM GMT+01:00, Eric Snow @.***> wrote:

Multiple interpreters can run in the same thread, so a thread-local may be ambiguous in that case.

As to storing a reference to the module state on the type object, that shouldn't be necessary. Every heap type as a ht_module field that points to the module, accessible via PyType_GetModule(). From there you call PyModule_GetState(). In fact, we already have PyType_GetModuleState() that combines those two calls. That's fairly efficient, though obviously not as efficient as a static global.

The only caveat is that PyType_GetModule() (and thus PyType_GetModuleState()) operates relative to the given type. Thus subclasses make things trickier. IIRC, when they're involved, you end up having to walk the MRO one way or another, which is what PyType_GetModuleByDef() does.

FWIW, the common case where PyType_GetModuleByDef() is used is either in "tp slot" functions or intermediate static functions that aren't doing the right thing. In both cases the problem lies in throwing away the expensive information we already have. For tp slot functions the API doesn't pass the module in, even though it could. We're stuck with backward compatibility issues there. For intermediate static functions, I expect it's all too common to figure out the module (or module state) at some "leaf node" function but not pass it through the intermediate functions. Thus the relatively expensive PyType_GetModuleByDef() gets called more than it needs to be.

I suppose another problem case is with callbacks that don't provide a way to pass the module through. I'm not sure that's a common issue though.

Back to the _decimal module, I haven't looked at what usage patterns are resulting in such a significant slowdown.

-- Reply to this email directly or view it on GitHub: https://github.com/python/cpython/issues/114682#issuecomment-1924776487 You are receiving this because you were mentioned.

Message ID: @.***> -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

hauntsaninja commented 8 months ago

@encukou I think your comment might have gotten lost? All I see is quote text

skrah commented 8 months ago

The float object gets the luxury of its own freelist and avoids the slow PyObject_GC_* functions altogether. The Decimal constructor is a pure function (the context is only used for recording errors) and Decimals are immutable, just like floats.

It is likely that 15 percentage points can be shaved off just by doing the same as in floatobject.c and then focus on the thread local context (indeed, PyType_GetModuleByDef() is called at dozens of places).

erlend-aasland commented 8 months ago

AFAICS, we can trim away a lot of the PyType_GetModuleByDef calls. For example, a lot of them are happening in methods; we can use the METH_METHOD calling convention on practically all of these, so we can access the module state using _PyType_GetModuleState:

https://github.com/python/cpython/blob/15f6f048a6ecdf0f6f4fc076d013be3d110f8ed6/Include/internal/pycore_typeobject.h#L100-L109

Also (but not performance relevant), the find_state_left_or_right() calls in the Py_tp_richcompare slots can be replaced with get_module_state_by_def() calls.

erlend-aasland commented 8 months ago

FTR: I did a quick proof-of-concept for some low-hanging fruit, and I already got a 4% ~~10%~~ 12% speedup.

erlend-aasland commented 8 months ago

So, here's my suggested approach. Let's move the relevant parts of bench.py over to pyperformance. That way, it is easier to keep an eye on performance regressions like this; the faster-cpython team is alert to sudden changes in a particular benchmark.

I'll try to tear out as many PyType_GetModuleByDef as possible and create a PR, hopefully within this week. The diff is going to be huge, but OTOH it is a mechanical change.

encukou commented 8 months ago

Thank you for getting to this so fast! Let me know if you need any help, and feel free to assign me to reviews.

erlend-aasland commented 8 months ago

Let me know if you need any help, and feel free to assign me to reviews.

Perhaps you could migrate bench.py over to pyperformance?

For now, I've used this patch for local benchmarking with pyperf[^1]:

```diff diff --git a/Modules/_decimal/tests/bench.py b/Modules/_decimal/tests/bench.py index 640290f2ec..6b6117f108 100644 --- a/Modules/_decimal/tests/bench.py +++ b/Modules/_decimal/tests/bench.py @@ -11,9 +11,14 @@ from functools import wraps from test.support.import_helper import import_fresh_module +import pyperf + + C = import_fresh_module('decimal', fresh=['_decimal']) P = import_fresh_module('decimal', blocked=['_decimal']) +runner = pyperf.Runner() + # # NOTE: This is the pi function from the decimal documentation, modified # for benchmarking purposes. Since floats do not have a context, the higher @@ -81,33 +86,20 @@ def wrapper(*args, **kwargs): return _increase_int_max_str_digits def test_calc_pi(): - print("\n# ======================================================================") - print("# Calculating pi, 10000 iterations") - print("# ======================================================================\n") - to_benchmark = [pi_float, pi_decimal] if C is not None: to_benchmark.insert(1, pi_cdecimal) for prec in [9, 19]: - print("\nPrecision: %d decimal digits\n" % prec) for func in to_benchmark: - start = time.time() if C is not None: C.getcontext().prec = prec P.getcontext().prec = prec - for i in range(10000): - x = func() - print("%s:" % func.__name__.replace("pi_", "")) - print("result: %s" % str(x)) - print("time: %fs\n" % (time.time()-start)) + name = f"{func.__name__}, precision: {prec} decimal digits" + runner.bench_func(name, func) @increase_int_max_str_digits(maxdigits=10000000) def test_factorial(): - print("\n# ======================================================================") - print("# Factorial") - print("# ======================================================================\n") - if C is not None: c = C.getcontext() c.prec = C.MAX_PREC @@ -147,4 +139,4 @@ def test_factorial(): if __name__ == "__main__": test_calc_pi() - test_factorial() + #test_factorial() ```

[^1]: for the pi benchmark only!

neonene commented 8 months ago

@skrah Do you have a significant regression test case where subclasses are frequently used? 3.12 (sigle-phase init/static types) was almost as slow as main in the given test for me once D = C.Decimal was replaced with class D(C.Decimal).

erlend-aasland commented 8 months ago

@skrah Do you have a significant regression test case where subclasses are frequently used? 3.12 (sigle-phase init/static types) was almost as slow as main in the given test for me once D = C.Decimal was replaced with class D(C.Decimal).

PyType_GetModuleByDef is slower for subclasses. Reducing the number of PyType_GetModuleByDef will have an impact.

skrah commented 8 months ago

@neonene Your test is reasonable! Subclasses are so slow that within 3.9 thatfloat is slowed down by 3.3 times and _decimal by 2.2 times.

The slowdown for subclasses within main is something like 4.5 times for float and 1.9 times for decimal. That seems to suggest, as I mentioned earlier, that certain things are better optimized for base class floats than for base class Decimals.

But in general, compared to 3.9, subclasses themselves seem to have gotten faster in main, so all comparisons are a bit messy.

neonene commented 8 months ago

Regarding the module-state-access after METH_METHOD (PEP 573) is applied to _decimal's methods, the concerns will move to hot tp_* functions, if any. A possible optimization would be:

PyObject *module = PyType_GetModule(type);
if (!module || _PyModule_GetDef(module) != module_def) {
    PyErr_Clear();
    // Subclasses can ignore the overhead for now? Otherwise,
    // the type should cache the module state (refcount-free) with the def.
    module = PyType_GetModuleByDef(type, module_def);
}
module_state *state = _PyModule_GetState(module);

Currently, a faster version of PyType_GetModule() is not offered, right?

vstinner commented 8 months ago

Regarding the module-state-access after METH_METHOD (PEP 573) is applied to decimal's methods, the concerns will move to hot tp* functions, if any.

In 2018, I wrote a change to use FASTCALL calling convention which made the telco benchmark 22% faster. But Stefan Krah was the module maintainer and was against the idea, so I gave up. See issue gh-73487.

vstinner commented 8 months ago

PyType_GetModuleByDef is slower for subclasses. Reducing the number of PyType_GetModuleByDef will have an impact.

If PyType_GetModuleByDef() overhead (compared to the previous version which didn't use it) is too high, we can cache directly important data (such as decimal_state *state) in instances (ex: PyDecObject), rather than getting it from the instance type.

erlend-aasland commented 8 months ago

[...] we can cache directly important data (such as decimal_state *state) in instances (ex: PyDecObject), rather than getting it from the instance type.

Yes, I already suggested it; we do this in the sqlite3 extension module with great success.

erlend-aasland commented 8 months ago

In 2018, I wrote a change to use FASTCALL calling convention which made the telco benchmark 22% faster. But Stefan Krah was the module maintainer and was against the idea, so I gave up. See issue gh-73487.

I see Serhiy's post point to a bpo post where Stefan disapproved of applying Argument Clinic only. Well, I used Argument Clinic extensively in my patch for speeding up _decimal; it did not break a single test ;)

encukou commented 8 months ago

Currently, a faster version of PyType_GetModule() is not offered, right?

That would be using PyHeapTypeObject.ht_module directly. If we know it's heap type, the check for Py_TPFLAGS_HEAPTYPE can become an assert.

skrah commented 8 months ago

That is an extraordinary comment from Mr. Stinner. It is entirely irrelevant to this issue:

This issue is about the slowdown introduced by the module state changes.
If AC speeds up _decimal in the fix, that speedup just camouflages the module state slowdown and makes honest benchmarks for this issue impossible!
In the linked FASTCALL issue, both @rhettinger and ultimately Mr. Stinner himself concurred with me or at least acquiesced.
At that time the new FASTCALL convention was already on the horizon and would have required another change!
At that time I planned to use the new FASTCALL convention directly.
I was not the "module maintainer", but the author. Big difference.

In short, bringing up a heavily editorialized version of the actual events just distracts from the problems at hand.

erlend-aasland commented 8 months ago

I withdraw myself from this discussion; I am no longer working on this issue.

neonene commented 8 months ago

If PyType_GetModuleByDef() overhead (compared to the previous version which didn't use it) is too high, we can cache directly important data (such as decimal_state *state) in instances (ex: PyDecObject), rather than getting it from the instance type.

The cache in a known object is mainly used to reduce function arguments in the call chains, right? Indeed, many PyType_GetModuleByDef() can be removed there.

I think a cache in the type instance can be used in the tp slot functions where PyType_GetModuleByDef() needs to be placed to check(exact) the given object's type through a module state, which seems not to be beneficial for the current very slow subclasses.

I think we can measure/consider the boost by FASTCALL separately.

vstinner commented 8 months ago

That is an extraordinary comment from Mr. Stinner. It is entirely irrelevant to this issue

I reacted to a comment about METH_METHOD. For me, a common way to use METH_METHOD is to use Argument Clinic with cls: defining_class.

neonene commented 8 months ago

@skrah METH_METHOD|METH_FASTCALL might be (worse than) neutral on the pi test. I'm leaving an experimental PR #115196, which was my first AC experience.

skrah commented 8 months ago

@neonene Thanks for doing this! AC does not seem to affect the number methods (nb_add etc.) used in the pi benchmark. So it is indeed entirely irrelevant to this issue (as you hinted at).

I cannot reproduce the stated 22% improvement for the telco benchmark (telco.py full) either. I get something around 10%.

If you want to work on a patch without AC, I can review it. The strategy suggested earlier of caching the state in the decimal object seems to be the most fruitful (intuitively, the actual patch will require some experimentation).

Generally, it would be good to have a reference module in Python core that does not use AC. AC is unused/not available in the vast majority of C-extensions. Many modules in the scientific ecosystem may want to use module state and vectorcall but not AC.

neonene commented 8 months ago

~I hit a crash at the METH_METHOD method whose defining_class argument was NULL: https://github.com/python/cpython/pull/115196#discussion_r1484911702. Does PEP 573 require an argument check before PyType_GetModuleState(cls)?~

neonene commented 8 months ago

I hit a crash at the METH_METHOD method whose defining_class argument was NULL

I take back. *_impl(self, cls) has in the module several callers which previously passed a NULL as a dummy.

neonene commented 8 months ago

I got no performance change by switching to _PyType_GetModuleState() in the *_impl functions: 1eeb531. As to PEP-573, applying METH_METHOD only to hot methods would be enough, if any.

neonene commented 8 months ago

def pi():
    import _decimal
    D = _decimal.Decimal
    for i in range(10000):
        lasts, t, s, n, na, d, da = D(0), D(3), D(3), D(1), D(0), D(0), D(24)
        while s != lasts:
            lasts = s
            n, na = n+na, na+8
            d, da = d+da, da+32
            t = (t * n) / d
            s += t

Module state access related? functions:

``` * python313.dll / pi (Windows) entry static dynamic % run Function Name count instr instr total total PyType_GetModuleByDef 17520026 28 350400520 13.2 29.7 PyContextVar_Get 3440000 66 106639996 4.0 37.8 PyObject_IsTrue 431188 44 3058140 0.1 96.0 PyBool_FromLong 432423 6 2174167 0.1 96.8 PyArg_ParseTupleAndKeywords 70160 33 1683905 0.1 97.2 PyObject_TypeCheck 70845 10 425126 0.0 98.8 * python313.dll / test_decimal entry static dynamic % run Function Name count instr instr total total PyType_GetModuleByDef 758275 28 18817245 0.1 74.5 PyObject_IsTrue 1241005 44 17488210 0.1 76.3 * _decimal.pyd / pi entry static dynamic % run Function Name count instr instr total total convert_op 5880000 46 119280000 4.8 10.9 nm_mpd_qadd 2100000 66 105000000 4.3 19.6 PyDecType_New 3850000 28 84700000 3.4 34.8 get_module_state_by_def 14150005 5 70750025 2.9 40.6 dec_addstatus 3850000 43 53900000 2.2 55.4 current_context 3440000 16 41280002 1.7 66.8 PyObject_TypeCheck 6380000 10 41010000 1.7 68.4 PyDecType_FromLongExact 910000 54 35490000 1.4 74.3 find_state_left_or_right 3370000 13 30330000 1.2 82.4 dec_from_long 910000 52 29640000 1.2 83.6 nm_mpd_qdiv 420000 66 21000000 0.9 87.8 nm_mpd_qmul 420000 66 21000000 0.9 88.6 dec_richcompare 430000 104 19780000 0.8 91.9 convert_op_cmp 430000 148 13330000 0.5 94.5 _dec_settriple 910000 12 10920000 0.4 97.4 dec_new 70000 46 2380000 0.1 99.8 PyDecType_FromObjectExact 70000 79 1890000 0.1 99.9 * _decimal.pyd / test_decimal entry static dynamic % run Function Name count instr instr total total exception_as_flag 389224 16 14121748 1.9 59.6 numeric_as_ascii 27009 80 4834675 0.7 74.8 context_getattr 216906 30 4254322 0.6 77.8 signaldict_getitem 199787 31 3864061 0.5 78.9 get_module_state_by_def 730003 5 3650015 0.5 79.9 list_as_flags 31123 23 1991184 0.3 89.7 PyDecType_New 86927 28 1912610 0.3 90.5 dec_new 54158 46 1793903 0.2 91.0 signaldict_setitem 65001 44 1624971 0.2 91.5 context_setattrs 16192 106 1387404 0.2 92.6 PyDecType_FromObjectExact 54156 79 1357695 0.2 92.8 signals_as_list 15564 26 1353918 0.2 93.0 dec_as_integer_ratio 12005 149 1304050 0.2 93.4 dec_addstatus 91578 43 1285473 0.2 93.5 getround 17391 37 1280117 0.2 93.9 current_context 92795 16 1113546 0.2 94.6 PyDecType_FromLongExact 28166 54 1098474 0.2 94.7 dec_from_long 28178 52 929611 0.1 95.1 PyDecType_FromCStringExact 26629 43 905482 0.1 95.3 PyObject_TypeCheck 119689 10 884052 0.1 95.5 dec_as_long 14946 96 881027 0.1 95.6 context_init 16130 52 806500 0.1 95.8 dec_dealloc 86927 9 782343 0.1 95.9 convert_op 37315 46 712770 0.1 96.1 context_new 16130 63 677475 0.1 96.2 dec_richcompare 14168 104 652109 0.1 96.4 nm_mpd_qdiv 12017 66 600801 0.1 96.7 PyDecType_FromUnicodeExactWS 26599 20 531980 0.1 96.9 convert_op_cmp 14168 148 441960 0.1 97.3 ```

skrah commented 8 months ago

I'm getting a smaller speedup by using PyType_GetModuleState() in the number methods and a larger one by using PyObject_New for the decimal constructor.

The _threadmodule itself uses PyObject_New in new_thread_handle.

@ericsnowcurrently, is there a reason why PyObject_New cannot be used for the immutable Decimal type?

vstinner commented 8 months ago

@erlend-aasland:

I withdraw myself from this discussion; I am no longer working on this issue.

@skrah:

That is an extraordinary comment from Mr. Stinner. It is entirely irrelevant to this issue

Well, same than Erlend for me: I unsubscribe. I'm not interested to work on an issue with such tone in the discussion.

ericsnowcurrently commented 8 months ago

is there a reason why PyObject_New cannot be used for the immutable Decimal type?

I'm not aware of any reason. It isn't a GC type, right?

skrah commented 8 months ago

I'm not aware of any reason. It isn't a GC type, right?

Yes, up to 3.12 it wasn't a GC type. It has been made a GC type in main during the module state changes.

I now found some explanations in PEP-630, but I still do not understand it:

Instances of heap types hold a reference to their type.
This ensures that the type isn't destroyed before all its instances are,
but may result in reference cycles that need to be broken by the
garbage collector.

How would one create a cycle?

>>> x = Decimal("3.222")
>>> x.y = x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    x.y = x
    ^^^
AttributeError: 'decimal.Decimal' object has no attribute 'y' and no __dict__ for setting new attributes

>>> x = Decimal("3.222")
>>> t = type(x)
>>> t.y = x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    t.y = x
    ^^^
TypeError: cannot set 'y' attribute of immutable type 'decimal.Decimal'

>>> x.adjusted = 10
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    x.adjusted = 10
    ^^^^^^^^^^
AttributeError: 'decimal.Decimal' object attribute 'adjusted' is read-only

neonene commented 8 months ago

The strategy suggested earlier of caching the state in the decimal object seems to be the most fruitful (intuitively, the actual patch will require some experimentation).

@skrah I have posted a draft: gh-115401

ericsnowcurrently commented 8 months ago

Yes, up to 3.12 it wasn't a GC type. It has been made a GC type in main during the module state changes.

Ah, there are definitely some subtleties involved with GC types, especially when switching from a non-GC type. I don't recall the specifics, but I'm sure @encukou does.

encukou commented 8 months ago

It might be more interesting to start with static (non-GC) types: these are “immortal” (statically allocated), but they do have a bunch of dynamically allocated, interpreter-specific data (e.g. subclasses, weakrefs). Getting this data properly cleaned up requires rather special hacks, appropriate for core types that really are created/destroyed with the interpreter itself, rather than extensions loaded on demand (see code around static_types[] in Objects/object.c for details). On the other hand, GC types (heap types) use reference counting like any other Python object. Not saying that getting rid of all the assumptions of static-ness/immortality is/was smooth, but at least the model is much clearer: we have a good idea of what the details should be, even though getting there is a bumpy road full of backcompat issues.

Instances of heap types hold a reference to their type. This ensures that the type isn't destroyed before all its instances are, but may result in reference cycles that need to be broken by the garbage collector.

This is a generic statement. A class with no __dict__ nor user-settable attributes probably can't create a reference cycle. While we could omit the instance→type reference only in cases where one can prove this, I don't think such a special case would be worth it.

neonene commented 7 months ago

How would one create a cycle?

PEP 573 and type_traverse() says:

Usually, creating a class with ht_module set will create
a reference cycle involving the class and the module.

https://github.com/python/cpython/blob/20eaf4d5dff7fa20f8a745450fef760f0923eb52/Objects/typeobject.c#L5526-L5534

To make Decimal type a non-GC heap type, at least the module arg needs to be NULL in PyType_FromMetaclass(), like thread_handle_type in _threadmodule?

I agree that PyObject_GC_*() has some overhead, but I'm not sure how to hold a module safely outside Decimal type.

gpshead commented 7 months ago

FYI - skrah has been banned from python spaces, quoting details from my message posted on a duplicate-ish issue: https://github.com/python/cpython/issues/114922#issuecomment-1989726811

""" Procedural comment: For all involved on this issue, @skrah was banned from Python spaces per a Conduct Working Group recommendation to the Steering Council in 2020. A configuration oversight on our part in 2020 made that not so on Github. The intended ban of @skrah was reinstated on 2024-02-22. Our apologies to all for the consequences of this mixup (it was the first core dev ban done on our modern systems, we now understand how and why mistakes were made).

Stefan was informed of this privately via email and re-offered the opportunity to apologize for past actions and to accept our code of conduct going forwards per the original 2020 ban's terms and notification. We have not received a reply.

I'm posting this here as we view replying in the same forum where issues recently surfaced a reasonable way to notify past, present, and potential future participants of an incident response.

This says nothing for or against the validity of this issue. A new issue should be opened if people wish to proceed. I'm locking the comments on this one. For information on what the Github concept of a ban means, read the Github org ban docs.

-- The Python Steering Council, 2024 edition ""

python / cpython

_decimal: pi benchmark 31-45% slower compared to 3.9 #114682

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs