Closed vstinner closed 6 months ago
There are good use cases for both – reading the signed (actual) value of a PyLong and reading its abs()
value. Given that the internal representation makes the latter very efficient, it seems tempting to pass that efficiency on to the users.
Maybe the unsigned function could have a second return argument pointer that gives the sign as 1 and -1? Users can pass NULL
if they know the sign, and look at the returned sign if they don't. Raising an exception is easy enough to leave it to users if sign and expectation diverge.
reading its
abs()
value
This is yet another case that nobody has mentioned previously, though I agree it's temptingly cheap to expose (at least for large values - anything less than Py_ssize_t is going to involve a comparison and negation - and of course performance characteristics can change over time, which is why we're exposing an API that isn't based on the internal representation in the first place).
But (unsigned)-2147467259
is not the same as abs(-2147467259)
, which is why having the C-style conversion is useful.
What's really the problem here is extracting values that require all the available bits to provide full fidelity. So one solution could be an optional out parameter that returns true iff it a zero sign bit was the only bit that couldn't be copied. So then the checks become:
res < 0
for when an exception was setres <= sizeof(target)
for when the entire value fit (the usual case)res > sizeof(target) && sign_bit_overflow_only == 1
for when a positive value happened to leave the MSB set (e.g. [128, 256)
into a single byte)[^1][^2]res > sizeof(target) && sign_bit_overflow_only == 0
for when the value was truncated to fit[^1]: Note that negative values cannot overflow by just the sign bit. At least one leading 1 bit has to remain, and so (signed)0x...FFFF_7FFF
cannot fit into 16 bits with a sign bit overflow. (signed)0x...FFFF_FFFF
can always fit into 16 bits - the problem is that (unsigned)0xFFFF
can also fit into 16 bits but we need to know that the 17th bit would've been zero.
[^2]: You wouldn't really need to check res > sizeof(target)
here, but we should specify that the flag is only set when a signed overflow occurred
If you're going to treat the value as signed, you only need to check res <= sizeof(target)
, as today. (If you want the sign, look at the MSB of the result.)
If you're going to treat the result as unsigned, you also allow sign_bit_overflow_only == 1
(noting that negative input values will never need to set this new flag).
reading its
abs()
value
Without knowing the use cases my first reaction is that we need to draw the line somewhere, and I'd be OK with this joining exporting non-byte digits on the other side of the line.
What's really the problem here is extracting values that require all the available bits to provide full fidelity.
Which is fairly important since C-ish APIs tend to smuggle unrelated information in high bits. If everyone used ints for actual counting, we wouldn't be here :)
[
scoder
] Maybe the unsigned function could have a second return argument pointer that gives the sign as 1 and -1? [zooba
] one solution could be an optional out parameter that returns true iff it a zero sign bit was the only bit that couldn't be copied
Or a nullable char *sign_out
argument:
NULL
, negative values cause ValueError
Again, it's not negative values that are the tricky problem - it's large positive values. By excluding negatives, all you're doing is annoying the caller.
The extra information that has to be returned is not "was the input positive or negative", it's "was the only information that was lost the sign bit". And this is only relevant for positive values because if you omit the sign bit from a negative value in two's complement, you change the value and so have always lost more than just the sign bit.
I updated #116053 last night to do the extra checks we need, and there are comments where that extra information needs to be returned when we figure out how best to do it (either by returning it, or by taking a flag that says to assume it).
Ah, I think I finally get it. There are 3 cases:
Is that right?
In my suggestion, I was thinking about a new function -- PyLong_AsUnsignedNativeBytes(..., char *sign_out)
-- to cover the last two.
If we have a new function, that's signal enough that we don't need to give it an extra argument. When you're calling that function, if the input was positive but the resulting MSB is set, we don't care (provided nothing higher than the MSB needed to be set).
I don't honestly see the benefit in rejecting negatives. The same rule applies - if the input was negative, provided the MSB is set (no information loss) and everything higher than the MSB would be set (sign extension), we can return success (which is the same as for the signed case). I'd rather just add a function for getting the sign from the PyLongObject so that people who want to reject negatives can do it, but I wouldn't want to conflate it with choosing between AsNativeBytes
and AsUnsignedNativeBytes
.
The range check (sign check) is to do with business logic, not with the binary representation.
So, writing 255 and -1 into 1-byte buffer would have the same result -- all bits set, which PyLong_FromUnsignedNativeBytes
would turn into 255. Right?
IMO, accepting negatives in AsUnsignedNativeBytes
is a footgun that at least needs a prominent note in the docs. I see it as perpetuating C's mistakes. But, I can see where you're coming from, and I can live with our difference in opinions.
all bits set, which
PyLong_FromUnsignedNativeBytes
would turn into 255. Right?
and PyLong_FromNativeBytes
would turn into -1. Right.
IMO, accepting negatives in
AsUnsignedNativeBytes
is a footgun that at least needs a prominent note in the docs.
This is fine, but my counterpoint is that there's no other way to do it in our C API (and the way to do it in Python is to & 0xFFFFF....
, which is a pain to do dynamically). So if we cut it off, we force users into complex workarounds, whereas if we allow it then it becomes possible.
And I think the documentation for this makes the most sense framed as "behaves like AsNativeBytes
but assumes the result will be used as unsigned, and so does not require positive input values to leave the most significant bit clear. This may result in large positive inputs being indistinguishable from some negative inputs. To exclude negative inputs, first test the sign with \<new API>"
I like the direction this is going, yes, that is the way I was hoping an Unsigned
API variant would behave. I do think it is useful to have a way to return that the value was negative. Petr's char *sign_out
idea makes sense to me there, always fill that in with 0 or -1 if it is non-NULL.
The interface seems complete and usable now. Is this done now or is there anything left for this ticket to stay open?
Looking things over I like the C API that what was settled upon. It seems to address all of the needs from our earlier discussions.
Feature or enhancement
The private
_PyLong_AsByteArray()
and_PyLong_FromByteArray()
functions were removed in Python 3.13: see PR #108429.@scoder asked what is the intended replacement for
_PyLong_FromByteArray()
.The replacement for
_PyLong_FromByteArray()
isPyObject_CallMethod((PyObject*)&PyList_Type, "from_bytes", "s#s", str, len, "big")
but I'm not sure what is the easy way to set the signed parameter to True (default:signed=False
).The replacement for
_PyLong_AsByteArray()
isPyObject_CallMethod(my_int, "to_bytes", "ns", length, "big")
. Same, I'm not sure how to easy set the signed parameter to True (default:signed=False
).I propose to add public PyLong_AsByteArray() and PyLong_FromByteArray() functions to the C API.
Python 3.12 modified PyLongObject: it's no longer a simple array of digits, but it's now a more less straightforward
_PyLongValue
structure which requires using unstable functions to access small "compact" values:So having a reliable and simple way to import/export a Python int object as bytes became even more important.
A code search for
_PyLong_AsByteArray
in PyPI top 5,000 projects found 12 projects using it:Linked PRs