Open markshannon opened 1 year ago
Has there been any progress in documenting the changes made in Python 3.12? (https://github.com/python/cpython/pull/101292#issuecomment-1618077570)
Maybe @markshannon can answer that? AFAICT all the commits linked above are his. I know we had someone who was interested in pursuing this further but she had to bow out.
@markshannon Where are we with documentation for this? If it's not documented, do we need to start working to roll this back? I'm not comfortable with this change in rc1 if it's not documented.
Mark is at EuroPython. If you are there too you can talk to him. We will get it documented.
I did a little research. It looks like there are two key changes. First, struct _longobject
(defined in Include/cpython/longintrepr.h but considered an internal implementation detail) has changed. It used to be
struct _longobject {
PyObject_VAR_HEAD
digit ob_digit[1];
};
I.e., there was an array ob_digit
whose length was abs(ob_size)
, where sign(ob_size)
gave the sign of the overall value.
The new (still internal) representation is as follows:
typedef struct _PyLongValue {
uintptr_t lv_tag; /* Number of digits, sign and flags */
digit ob_digit[1];
} _PyLongValue;
struct _longobject {
PyObject_HEAD
_PyLongValue long_value;
};
and there are new internal macros to determine the number of digits and the sign, and a bunch of internal macros to handle "compact" values (which fit in 1-2 "digits").
There are two new public, unstable APIs to support the concept of "compact" values: PyUnstable_Long_IsCompact
and PyUnstable_Long_CompactValue
. See https://docs.python.org/3.12/c-api/long.html#c.PyUnstable_Long_IsCompact. (In reality these are implemented as macros, and not intended to be part of any ABI.) Everything else that digs through the internals is defined in Include/internal/pycore_long.h, and requires defining Py_BUILD_CORE
.
Details of what the bits in lv_tag
mean are intentionally not published -- these are meant to be opaque. Applications that used to dig through ob_digits
using ob_size
as guidance will break, and have two options: Switch to calling the Python-level APIs int.to_bytes()
and int.from_bytes()
via PyObject_CallMethod()
(see note at https://docs.python.org/3.12/c-api/long.html#c.PyLong_FromString). Or go hard-core, defining Py_BUILD_CORE
and importing pycore_long.h. Or, I guess, an intermediate path is to use the new unstable public APIs for dealing with "compact" values and use the slower arbitrary-precision API for non-compact values.
I think in the What's New in 3.12 we should at least mention the change in the struct (calling out that using ob_size
and ob_digits
is no longer supported) and the new unstable public APIs (and what they're for). I don't think we need to call out the hard-core option, but maybe a reminder about to_bytes()
and from_bytes()
would be useful (even though that's been in the docs at least since 3.10).
@Yhg1s @markshannon What do you think of this? I volunteer to make a PR for what's new 3.12 along the lines of what I wrote above.
@Yhg1s Assuming the changes Mark made to what's new in 3.12 are what you wanted?
Yep, that's adequate.
In Python 2
int
s andlong
s were different objects, and the design of each was tailored to the different size and use cases. In Python3 we dropped the distinction, but we also dropped the design forint
s that fit into a single word. We have added various fast paths for "medium" integers (e.g. https://github.com/python/cpython/issues/89109) but the underlying data structure gets in the way.We should layout the int/long object so that it supports fast operations for most integers.
See https://github.com/faster-cpython/ideas/issues/548 for a fuller discussion
Linked PRs