python / cpython

The Python programming language
https://www.python.org
Other
63.48k stars 30.4k forks source link

C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140

Closed vstinner closed 6 months ago

vstinner commented 1 year ago

Feature or enhancement

The private _PyLong_AsByteArray() and _PyLong_FromByteArray() functions were removed in Python 3.13: see PR #108429.

@scoder asked what is the intended replacement for _PyLong_FromByteArray().

The replacement for _PyLong_FromByteArray() is PyObject_CallMethod((PyObject*)&PyList_Type, "from_bytes", "s#s", str, len, "big") but I'm not sure what is the easy way to set the signed parameter to True (default: signed=False).

The replacement for _PyLong_AsByteArray() is PyObject_CallMethod(my_int, "to_bytes", "ns", length, "big"). Same, I'm not sure how to easy set the signed parameter to True (default: signed=False).

I propose to add public PyLong_AsByteArray() and PyLong_FromByteArray() functions to the C API.

Python 3.12 modified PyLongObject: it's no longer a simple array of digits, but it's now a more less straightforward _PyLongValue structure which requires using unstable functions to access small "compact" values:

So having a reliable and simple way to import/export a Python int object as bytes became even more important.


A code search for _PyLong_AsByteArray in PyPI top 5,000 projects found 12 projects using it:

Linked PRs

scoder commented 1 year ago

Thanks for creating the issue. I agree that the functions should be added. The current replacements seem awful for this kind of basic functionality. Going through an expensive Python call like this for converting between PyLong and large C integer types (int128_t) seems excessive.

Note that at least a couple of projects that you list use Cython implemented parts and thus probably just mention the function in there. I'm sure something like line_profiler would never end up calling it.

serhiy-storchaka commented 1 year ago

It was already discussed several times. This API lacks some features which would be needed for general use. You need to know the size of the resulting bytes array. Calculating it is not trivial, especially for negative integers. Also, it would be core convenient to support "native" ending, not just "big"/"littel".

I have been thinking about implementing a similar API for mpz_import/mpz_export in GMP (https://gmplib.org/manual/Integer-Import-and-Export) or mp_unpack/mp_unpack in libtommath (https://github.com/libtom/libtommath). It is powerful and a kind of standard. It is general enough to support internal PyLong represenation and the marshal format used for long integers (15 bits packed in 16-bit words). But it is only for unsigned integers, you are supposed to store the sign separately. It should be extended to support negative integers in several formats, for convenience and for performance.

vstinner commented 1 year ago

I'm not sure that passing the endian as a string is efficient if this function is part of hot code.

serhiy-storchaka commented 1 year ago

Not as a string. Just 3-variant value native/little/big instead of boolean little/big.

vstinner commented 1 year ago

Not as a string. Just 3-variant value native/little/big instead of boolean little/big.

Sorry, I was confused between C API (int for the endian) and the Python API (string for the endian).

scoder commented 1 year ago

It was already discussed several times. This API lacks some features which would be needed for general use.

Ok, then please put the existing function back in until there is a public replacement.

scoder commented 1 year ago

I created PR https://github.com/python/cpython/pull/111162

vstinner commented 1 year ago

This API lacks some features which would be needed for general use.

Which features are missing?

Do you have links to previous discussions if it was discussed previously?

vstinner commented 1 year ago

You need to know the size of the resulting bytes array.

wcstombs() can be called with NULL buffer to compute the buffer size. It avoids to have to provide a second API "calculate the buffet size".

I suppose that a common use case is also to convert a Python int object to C int type for which there is no C API. Like int128_t

scoder commented 1 year ago

I suppose that a common use case is also to convert a Python int object to C int type for which there is no C API. Like int128_t

Or something like 256 bytes for a crypto key, hash value or similar data type. Probably a known size, so that users can live with an exception if it doesn't fit (because it's an error or can't-handle-that situation).

That said, a function to ask for the bit length of the integer value could be helpful in order to find a suitable integer/storage size. And also more generally to estimate the size of an integer value.

Both together would probably cover a lot of use cases.

serhiy-storchaka commented 1 year ago

https://mail.python.org/archives/list/python-ideas@python.org/thread/V2EKXMKSQV25BMRPMDH47IM2OYCLY2TF/

encukou commented 1 year ago

I'm partial to API like mpz_export that accepts a buffer and length, and can:

This allows handling the common cases with a minimum of function calls:

But, IMO we also need general API for exporting/importing Python objects to/from buffers (see e.g. #15 in capi-workgroup/problems), and it would be good to make this consistent.

I'd prefer adding the original functions back until we design a proper replacement.

scoder commented 1 year ago

mpz_export() looks like a good blue print. It's based on "word data", though, not bytes. I'm not sure if that design is something important to copy. There are certainly use cases for this (it resembles SIMD-like operations, for example). However, I'm not convinced that reordering bytes in native/little/big-endian words is an important feature for a PyLong C-API. Whoever needs this kind of special functionality can probably implement it in a separate pass on their side. A simple byte array export in (overall) native/little/big ordering seems sufficient to get the data in and out.

Note that it goes together with an mpz_sgn() function to query the sign since that is not part of the export. That's reasonable since the sign is unlikely to become part of the internal PyLong digits representation even in the future. Given that Py_SIZE() cannot be used for the sign detection any more, a stable function and a fast inline function would both be helpful for this.

So, basically, the proposal is to add

IMO we also need general API for exporting/importing Python objects to/from buffers, and it would be good to make this consistent.

It's not strictly related, though. I think a PyLong number is sufficiently different from an arbitrary Python object array to not require a highly similar interface. If it can be made similar, fine. I wouldn't hold up one for the other, though.

Regarding Serhiy's concerns about missing ABI-stability of enum flags and arguments: we've used C macro names for this for ages, and they turn into simple integers that can be passed as C int arguments. Just #define some names and people will use them.

vstinner commented 1 year ago

See also comments about removed _PyLong_New(): https://github.com/python/cpython/pull/108604#issuecomment-1774326866

vstinner commented 12 months ago

_PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() functions were restored by commit https://github.com/python/cpython/commit/a8a89fcd1ff03bb2f10126e0973faa74871874c3.

zooba commented 9 months ago

Reopening, because I think at a minimum we should have the two functions mentioned in the title.

My proposed API (I have an implementation, but not PR ready yet, and one open question) is basically the one Petr liked but simpler:

/* PyLong_AsByteArray: Copy the integer value to a native address.
   n is the number of bytes available in the buffer.
   Uses the current build's default endianness, and sign extends the value
   to fit the buffer.

   Returns 0 on success or -1 with an exception set, but if the buffer is
   not big enough, returns the desired buffer size without setting an
   exception. Note that the desired size may be larger than strictly
   necessary to avoid precise calculations. */
PyAPI_FUNC(int) PyLong_AsByteArray(PyObject* v, void* buffer, size_t n);

/* PyLong_FromByteArray: Create an integer value containing the number from
   a native buffer.
   n is the number of bytes to read from the buffer.
   Uses the current build's default endianness, and assumes the value was
   sign extended to 'n' bytes.

   Returns the int object, or NULL with an exception set. */
PyAPI_FUNC(PyObject *v) PyLong_FromByteArray(void* buffer, size_t n);

I'm comfortable making these only do default endianness, because they're really intended as an extension of all the other int conversions we have that also do default endianness. Alternate endianness is a specialised formatting or bit packing operation.

The "designed size may be larger than strictly necessary" is to allow returning sizeof(Py_ssize_t) for a "compact" int, rather than calculating the exact number of bits, and potentially doing the same for a larger int if we come up with a faster calculation.

I envision the use here to be like this (note the EAFP):

int value;
int n = PyLong_AsByteArray(o, &value, sizeof(value));
if (n < 0) goto abort; // exception, nothing useful we can do
if (n == 0) {
    // use value
} else {
    // malloc some memory
    n = PyLong_AsByteArray(o, <new memory>, n);
    // use new memory
    free(new_memory);
}

Similarly for FromByteArray - that should work efficiently against pointers to locals, so you don't have to choose the right API name for the type.

However, the bit I'm wavering on is what to do with unsigned values with the MSB set. Right now, you need to allow an extra byte to "prove" that 2**64-1 is not -1, which is just a pain when you'd rather PyLong_AsByteArray(o, &uint64_value, sizeof(unsigned int64_t)). With C++ templates this wouldn't even come up, but I can't decide on my own between:

I don't think we have perf concerns at the point where this matters, as we're already at the extremes of a 64-bit integer (for most platforms). That's too big for a "compact" int, so we're on the slow path already. But I do want to get the usability right. I'm leaning towards a PyLong_AsUnsignedByteArray function that basically differs only in the sign extension part.

encukou commented 9 months ago

I'd prefer exposing both endiannness and signedness as arguments. As I see it, the functions should be intended for serialization too, not just for converting to native ints -- and in that case, it's best to be explicit.

Perhaps we should use named flags, like:

int n = PyLong_AsByteArray(o, &value, sizeof(value), Py_AS_NATIVE_ENDIAN | Py_AS_SIGNED);
zooba commented 9 months ago

As I see it, the functions should be intended for serialization too

All the scenarios I've seen have just been about converting to native ints (in contexts where serialization may happen next, but has to happen via a native int). Can you/anyone show me some where the caller doesn't want the int value, but just wants to store the bytes? (And doesn't want to/can't use the struct module, which is intended for this case.)

FWIW, non-default endianness is inevitably a slow path. We can make this very fast for normal sized, native endian values, which are the vast majority of cases, but forcing an endianness has to slow things down.

zooba commented 9 months ago

How about this as a proposed API:

PyAPI_FUNC(int) PyLong_AsByteArray(PyObject* v, void* buffer, size_t n);
PyAPI_FUNC(int) PyLong_AsUnsignedByteArray(PyObject* v, void* buffer, size_t n);
PyAPI_FUNC(int) PyLong_AsByteArrayWithOptions(PyObject* v, void* buffer, size_t n, int flags);

Where the first two are essentially exported aliases that make it easier to read/write code without having to remember/write a set of flags every time?

encukou commented 9 months ago

Makes sense. This even allows us to add PYLONG_ASBYTEARRAY_EXACT_SIZE later, if there's interest in that.

Can you/anyone show me some where the caller doesn't want the int value, but just wants to store the bytes?

Sure, look in _pickle.c, marshal.c or multibytecodec.c. Generally: when saving to disk (or sending over network), native-endian is a bad default.

struct is good for normal-sized values, if you can afford the overhead of Python & format strings.

FWIW, non-default endianness is inevitably a slow path.

That's fine. But sometimes it's the correct path.

IMO, endiannes is something you should think about when converting an int to/from bytes. That suggests it should be an explicit argument. The correct place to hide the detail is wrappers like PyLong_AsLong or *PyLong_FromUint64_t.

Maybe PyLong_AsSizedInt would be a better name for the arg-less functions?

zooba commented 9 months ago

Sure, look in _pickle.c, marshal.c or multibytecodec.c.

I'm aware of these - you can see that I touched them already - but they are in a slightly different category because they're bypassing the "convert Python int to C int" step (alternatively, conflating conversion and packing). This is the privilege of internal APIs, but I don't believe it means we need to conflate conversion and packing for external callers (and I'd also be fine with our internal ones converting to a native value first and then serializing).

If you're using say grpc or some other native protocol library, do you need to provide pre-swapped values? Or do you provide native values and then the serialization library handles the swapping?

IMO, endiannes is something you should think about when converting an int to/from bytes.

Except the model we are being asked for here is "convert to arbitrary sized native int". PyO3, Cython, etc. don't want to think about endianness - they just want our special int type to become a native type. So I'd rather design an API that caters to that need rather than the more specialised need of serialization algorithms (which need to handle native ints anyway).

Maybe PyLong_AsSizedInt would be a better name for the arg-less functions?

I'm not enamored with the current name, but I don't really like "sized int" as a concept either. PyLong_AsNative[Unsigned]Integer maybe? But then, sticking with ByteArray is fine and using the docs/examples to lead people towards using it on static locals before allocated buffers is probably sufficient.

zooba commented 9 months ago

Thinking some more about naming, and about the signed/unsigned option (which isn't really that, for AsByteArray), I'm leaning towards making it a single PyLong_CopyBits(PyObject *v, void *buffer, size_t n_bytes, int endianness).

"Copy bits" is literally what we're doing, and while I dislike then taking the number of bytes instead of bits, I even more dislike trying to do partial bytes (and losing the use of sizeof()).

If we guarantee that n_bytes bytes are written and then return the actual number of bytes required to fit the full precision value (including sign), then simply ignoring the (positive) return value is the same as what the "unsigned" flag currently does. So callers who just want a C-style cast into a value can check for negative result (exception), and those who want a checked cast can also check for result greater than n_bytes.

That gets us down to one API, and while I still dislike having endianness in there, I can live with it being -1 = native, 0 = big, 1 = little such that passing PY_LITTLE_ENDIAN will work, or passing -1 means "just do the right thing".

Hopefully with a tweak to the current (private) _AsByteArray I can make it satisfy the "write all bytes" case without needing the allocation.

zooba commented 9 months ago

Okay, I'm much happier with those APIs. Copying the prototypes and comments from my draft PR here for reference (these aren't full docs, but should cover enough detail to understand how to use them):

/* PyLong_CopyBits: Copy the integer value to a native buffer.
   n_bytes is the number of bytes available in the buffer. Pass 0 to request
   the required size for the value.
   endianness is -1 for native endian, 0 for big endian or 1 for little.
   If an exception is raised, returns a negative value.
   Otherwise, returns the number of bytes that are required to store the value.
   To check that the full value is represented, ensure that the return value is
   equal or less than n_bytes.
   All n_bytes are guaranteed to be written (unless an exception occurs), and
   so ignoring a positive return value is the equivalent of a downcast in C.
   In cases where the full value could not be represented, the returned value
   may be larger than necessary - this function is not an accurate way to
   calculate the bit length of an integer object.
   */
PyAPI_FUNC(int) PyLong_CopyBits(PyObject* v, void* buffer, size_t n_bytes,
    int endianness);

/* PyLong_FromBits: Create an int value from a native integer
   n_bytes is the number of bytes to read from the buffer. Passing 0 will
   always produce the zero int.
   PyLong_FromUnsignedBits always produces a positive int.
   endianness is -1 for native endian, 0 for big endian or 1 for little.
   Returns the int object, or NULL with an exception set. */
PyAPI_FUNC(PyObject*) PyLong_FromBits(void* buffer, size_t n_bytes,
    int endianness);
PyAPI_FUNC(PyObject*) PyLong_FromUnsignedBits(void* buffer, size_t n_bytes,
    int endianness);
encukou commented 9 months ago

OK. I thought you are going for serialization/packing -- an analog to the Python to_bytes. The old name, AsByteArray, sounded that way :)

I agree that for conversion to fixed-size native int, an endianness argument isn't really needed. (I imagine libraries like grpc need to take native ints. Maybe in some cases we could avoid swapping back and forth, but where that's a real performance concern, IMO we'd need to go much deeper.)

That said, if we do provide an endianness argument, we're covering most serialization/packing cases as well. IMO, it's worth it to cover this even if we're not specifically asked for it in this issue.

I'm happy with the latest iteration of the API, but I'll add some comments for your consideration:

zooba commented 9 months ago
  • we could go back to the AsByteArray name, which might be more discoverable.

I worry a little that people would expect it to return a PyObject bytearray, or maybe it will allocate the array for you. As it does neither of those things, I like that the name doesn't suggest it.

  • IMO, native endianness can be resolved at compile time. We could always pass PY_LITTLE_ENDIAN rather than -1, and avoid the runtime check.

You can call with that if you like (I deliberately picked the values to allow it), but it doesn't avoid much. And you can't call it with that from a non-C language, so hard coding -1 for anyone not using C is a way to do-what-I mean.

It might be worth benchmarking without the option, which would avoid a few branches and reduce the overall length of the code, but I suspect it won't make a huge difference. I'm honestly more concerned about getting the big endian paths wrong as I don't have anything handy to test them on!

  • PyLong_FromUnsignedBits always produces a non-negative int.

Maybe it always rounds up to one 🙃 But yeah, noted.

skirpichev commented 9 months ago

Can we consider more generic import/export capabilities, better resembling mpz_import/export: i.e. import/export into an array of "digits", where 1-byte width case is just one corner case? See https://github.com/python/cpython/issues/102471#issuecomment-1620352561 for a variant of the API.

This is something that could be useful for interaction with other MP libraries, like the GMP. BTW, the CPython itself has _PyLong_FromDigits() private function just for the decimal module.

encukou commented 9 months ago

@zooba: Nothing here hinges on this detail, but for your consideration:

you can't call it with [a named constant] from a non-C language, so hard coding -1 for anyone not using C is a way to do-what-I mean.

Any non-C wrapper will need to expose CPython's named constants. It's not too easy to do now, but I don't think that's a good reason to avoid names. (And some wrappers might want to make this an enum. It's better if CPython picks the names, rather than individual wrappers.)


Can we consider more generic import/export capabilities

Yes, but in addition to this. Having digit size and bits per digit as arguments complicates the common case. (And unlike endiannes, they have a good obvious default.)

zooba commented 9 months ago

Any non-C wrapper will need to expose CPython's named constants.

PY_LITTLE_ENDIAN is conditionally defined, though. Exposing our constants that are our constants is one thing, but having to detect things that are figured out by our configure script is quite a different ask. Having a fixed value that means "use whatever was detected at compile time" handles this well.

If I add names, PY_LITTLE_ENDIAN wouldn't be it. It'd be PYLONG_COPYBITS_ALWAYS_LITTLE_ENDIAN or something equally specific. And we'd still document that passing PY_LITTLE_ENDIAN is a suitable (C) alternative to passing PYLONG_COPYBITS_NATIVE_ENDIAN. Frankly, this pushes me towards making it native only and letting people swap later if they need it.

some wrappers might want to make this an enum.

I'm totally not concerned with wrappers who are exposing our C API again. I'm thinking about wrappers like PyO3, PyBind, Cython, etc. who just need to convert into a regular integer type. If someone (pyapi_compat?) wants to export their own version of PyLong_CopyBits, then they can do whatever they like.

Can we consider more generic import/export capabilities

Yes, but in addition to this.

Agreed, they are out of scope here. Converting to native integers is enough for a high-level interface on our side, and repacking those into non-native integers is left as an exercise to the caller, or to people who want to take a dependency on our internals.

We're not a numerics library, we're an embeddable language runtime.

skirpichev commented 9 months ago

Yes, but in addition to this.

Do you mean a new set of functions, like PyInt_Import/Export from #102471, even if their functional will partially overlaps with discussed in the current issue?

Converting to native integers is enough for a high-level interface on our side

Sorry, perhaps I misunderstood the purpose of proposed API as from_bytes/to_bytes alternative, whose cover not just native integers (i.e. C types, IIUIC).

people who want to take a dependency on our internals

Huh. I doubt Sage/gmpy2 people really want to have such dependency:) The reason of using private stuff (gmpy2 has ifdefs for 3 different CPython versions so far) --- the absence of a public C-API, that can replace this.

zooba commented 9 months ago

Sorry, perhaps I misunderstood the purpose of proposed API as from_bytes/to_bytes alternative, whose cover not just native integers (i.e. C types, IIUIC).

Well those functions have essentially the same support as the new APIs, so it's a reasonable inference. Suggesting non-byte digits would be a new addition to both the to_bytes API and the new native one (and would be rejected from both as "too specific").

Huh. I doubt Sage/gmpy2 people really want to have such dependency:)

Understandable :) But they should be able to extract a Python int to byte-sized digits and then convert from that, if that's what they need to do. Going directly from our arbitrary sized digits to their arbitrary sized digits is more complex, and neither side wants to be responsible for it. So translating via a common representation is the workaround.

skirpichev commented 9 months ago

Suggesting non-byte digits would be a new addition to both the to_bytes API

I'm not proposing to change also the pure-python API.

they should be able to extract a Python int to byte-sized digits and then convert from that, if that's what they need to do.

It's possible. Unfortunately, this solution is more slow.

neither side wants to be responsible for it

Actually, right now - the GMP library side is responsible. But this requires access to internals of the CPython long representation (ob_digit array) and using private C-API functions (to allocate PyLong object, test for negativity, to get/set the number of digits, etc, see #111415).

BTW, maybe another option to have this on the GMP side - is offer mpz_limbs_read/write-like API to access PyLong's digits.

zooba commented 9 months ago

neither side wants to be responsible for it

Actually, right now - the GMP library side is responsible.

These are not contradictory statements! 😄 GMP is fine to do that, they just have to understand that we won't necessarily consider them when changing the internal implementation of our integers. And the point of having a stable API is so that we don't break most people when we do change them (e.g. we could trivially keep PyLong_AsNativeBytes - new name - exactly the same if we switch to tagged pointers).

But if we make public APIs that expose our implementation, then we are locked in and cannot change it. So perf improvements for the majority of users get lost, because we have to maintain GMP's interface. We don't want to do that.

Maintaining the code ourselves to do the conversion to arbitrary bit-sized words would be the ideal, if we had unlimited resources. But we don't, so we have to balance what we believe we can responsibly promise to maintain for the next 10+ years against what would be ideal. Right now, I don't see how we would cover that kind of transformation, so I won't sign us up for it. (Other core devs are more cavalier than I am though, so may be convinced...)

zooba commented 9 months ago

The PR #114886 is ready for merge, though I'd really like another positive review or two before then (trading off against getting into alpha 4 next week - there's time for fixes before RC).

If you haven't yet taken a look at the changes, but think this may be relevant to you, at least check out the docs and let me know if they make sense or not.

skirpichev commented 9 months ago

Maintaining the code ourselves to do the conversion to arbitrary bit-sized words would be the ideal, if we had unlimited resources.

Ok, it seems that proposal for PyInt_Import/Export from https://github.com/python/cpython/issues/102471 is out of the question.

I realize that the proposal to have a mpz_limbs_read/write-like API probably is offtopic here. But let me reveal it a bit more to see if it does make sense to be discussed in details somewhere else.

But if we make public APIs that expose our implementation, then we are locked in and cannot change it.

I'm not suggesting just to document the _PyLongValue :) The new function, say:

digit* PyLong_AsDigits(PyObject* obj, Py_ssize_t size)

will offer a mpz_limbs_write()-like API (to access unsigned bigint as an array of "digits") and might be much complex internally than just return obj->long_value.ob_digit (with possible reallocation).

Such view, I believe, is common for all arbitrary precision C-libraries (modulo differences in size of a "digit"). Could you suggest scenario where this view of "big enough" PyLong's might be broken in a future? (It's ok if users will have to handle "small enough" values separately with PyUnstable_Long_CompactValue(), etc.)

With this kind of API (probably much less complex than PyInt_Import/Export) on the CPython side Sage/gmpy2/python-flint and friends could do fast conversion to/from PyLong's using mpz_import/export functions.

encukou commented 9 months ago

@skirpichev, even if Export or limbs_write are added, the byte-based API proposed in this issue will still be useful. Could you open a new issue, or better, a thread on Discourse? I'm not saying that to drive you away: if you're interested, let's find a way. (IMO: While we're an embeddable language runtime, we do have a bigint type, and I think Python owes a lot of its success from allowing low-level access to basic types. People do traditionally use fragile hacks for that, and that's a problem to fix.). But proper bigint export/import is out of scope here.

zooba commented 9 months ago

Thanks all for the discussion and contributions!

scoder commented 8 months ago

I tried using the new PyLong_AsNativeBytes() function in Cython, but it's unclear to me how to handle signed values. The documentation seems to say, "well, then you get 17 bytes instead of the suitable 16", but the code example in the docs says "if you get more bytes than the buffer size, it's an overflow case". How can I distinguish between the two cases?

The net effect is that one of our tests is failing with the new API: https://github.com/cython/cython/pull/5997#issuecomment-1952218183

gpshead commented 8 months ago

The 3.13alpha4 PyLong_AsNativeBytes API leaves it up to the caller to determine what counts as an error. If a user wants bounds checking errors they always need to introspect the value near possible boundary conditions. At best, this API allows users to avoid those checks when there is no possible overflow condition (the half of possible values that are unambiguious).

If you don't want to allocate temporary space for the extra byte to check the value of the high bit in that to determine the underlying sign of the PyLongObject, you basically need to do a Python v < 0 check. Which seems quite annoying to do from the C API alone.

/* XXX - old version, see later PR comment.
PyObject *zero = PyLong_FromLong(0);  // cache this in your module level and add module cleanup decref code instead?
bool is_negative = PyObject_RichCompareBool(v, zero, Py_LT);
Py_DECREF(zero);
*/

bool is_negative = PyObject_RichCompareBool(v, Py_False, Py_LT);  // Py_False is a convenient 0 value.

For small fixed sizes (16 bytes is small) I'd just use temporary stack space, inspect the excess high byte of that when required and copy the 16 bytes into the actual destination. That work is entirely within an L1 cache line and avoids expensive comparison calls.

For huge bignums - I'd not want temporary space and make the conditional C API call tradeoff instead.

If everything fits in n_bytes, the value must be in the $[0, 2^{8n-1})$ range because the signed/unsigned ambiguity means an additional byte is allocated no matter what for both (Pos) $[2^{8n-1}, 2^{8n}-1)$ or (Neg) $[-2^{8n-1}, -1]$.

I can't say that I really like this. Most users of this API will want to repeat some form of this checking logic. That leaves significant space for every user to make honest mistakes. :/

The way to do otherwise would be changing this new API: add an additional boolean parameter to control which of those two (Pos) or (Neg) options it should allocate the extra sign extension byte for - effectively making such checks internal and making the return value of n_bytes+1 mean "actual overflow" rather than "maybe an overflow depending on what you wanted vs the actual value that you don't have yet unless you provided extra temporary space". Something like:

Py_ssize_t PyLong_AsNativeBytes(PyObject *pylong, void *buffer, Py_ssize_t n_bytes, int signed, int endianness)

We're in Alpha phase, we could still do that. (IIRC earlier proposals made it two API calls instead of a parameter - Signed and Unsigned? also an option.)

scoder commented 8 months ago

Py_ssize_t PyLong_AsNativeBytes(PyObject *pylong, void *buffer, Py_ssize_t n_bytes, int signed, int endianness)

We won't need the endianness flag since we're requesting "native bytes" as per the function name, but being explicit about the expected sign bit makes perfect sense to me.

I'd rather suggest having two functions though, one signed, one unsigned. That's how all the other conversion functions currently work.

scoder commented 8 months ago
PyObject *zero = PyLong_FromLong(0);  // cache this in your module level and add module cleanup decref code instead?
bool is_negative = PyObject_RichCompareBool(v, zero, Py_LT);
Py_DECREF(zero);

It's a bit quicker to hackishly compare against Py_False as zero, but it's still a quirk that needlessly wastes time. We're designing a new API here, we should make sure it's helpful and fast.

zooba commented 8 months ago

Putting -(2**127) into a 128-bit signed value should work fine. The topmost bit will be set and the rest are zero, which is correct. -(2**127 + 1) requires 129 bits or else you lose the sign.

Values in [2**127, 2**128) into an 128-bit unsigned value are the edge case. Note that all of them are positive, so your check needs to be that the number is in this range - not trivial to do in any language, and certainly not as simple as checking the sign.

An additional flag could be used to make the API return 16 rather than 17 in this case. But that's all it would be - it isn't really checking signed/unsigned, it's just not requiring room for a sign bit for large positive integers that would otherwise have lost their 0 sign bit.

(It would probably also imply "negative inputs are treated as positive with infinite leading 1 bits, but we only need room for at least one of them" which is actually the normal definition of two's complement, which is why I didn't bother making a fuss about it in the docs. Adding more details sometimes just makes things more confusing.)

zooba commented 8 months ago

The most efficient way I can think of to handle this particular case would be to extract into 128-bits, and if the returned value is 17 and the destination is unsigned then do the more complex comparison. Most of the time, the return value will be less than 17 and you don't have to do any extra work at all, and if it's more than 17 you don't have to do the work either.

At that sized value, we're already relying on a very complex function that doesn't give us enough information to know whether it only needs the sign bit or not. Maybe that could be added, but it would be more destabilising than I want to risk myself (and I don't know who the expert would be for it at this point). We'd probably end up doing an internal overallocation ourselves to implement a flag to make it transparent, which is going to be a penalty on every single conversion, not just those that are close to a boundary.

[Update] I meant 16/17 bytes, of course.

zooba commented 8 months ago

Putting -(2**127) into a 128-bit signed value should work fine.

Okay, I see the problem here is that _PyLong_NumBits (which already existed) doesn't handle negative values, and so _PyLong_NumBits(-2**127) == _PyLong_NumBits(2**127) despite this not being strictly true. We can adjust the expected size in this case to handle it.

This probably means that a new argument could be added here for the "don't require a sign bit if it would be 0" case.

scoder commented 8 months ago

This probably means that a new argument could be added here for the "don't require a sign bit if it would be 0" case.

As I wrote, all other conversion functions have signed and unsigned variants. Why design this differently? Users would probably know whether they expect a (possibly) negative result or not.

gpshead commented 8 months ago

I think a PyLong_AsUnsignedNativeBytes addition to compliment PyLong_AsNativeBytes makes the most sense. It offers round-trippable symmetry with the new PyLong_FromNativeBytes and PyLong_FromUnsignedNativeBytes pair.

This way the API reports overflowing values accurately via the returned size instead of forcing API callers to reinvent how to determine that on their own.

zooba commented 8 months ago

What will PyLong_AsUnsignedNativeBytes do if passed a negative int? Will it still sign extend? That's important behaviour to me, and I'd hate to lose it in favour of an arbitrary exception.

FWIW, the underlying function that does the conversion has an "unsigned" mode that just rejects negative values, but doesn't otherwise change its behaviour. What is needed here is something different from that.

encukou commented 8 months ago

What will PyLong_AsUnsignedNativeBytes do if passed a negative int?

A negative value cannot be represented as unsigned native bytes, so I think it should reject them: set ValueError and return -1.

zooba commented 8 months ago

The problem is that conversions already exist that interpret unsigned native bytes as signed, and so you can get negative values that were never meant to be (e.g. many Windows error codes show up on OSError.winerror as negative, when they weren't meant to).

In C, if you cast those negative values to unsigned then it will sign extend. Why shouldn't we do the same in C?

encukou commented 8 months ago

IMO, that's a C language issue, and one that you can solve by careful typing. Regrettably, C is overly eager to cast between signed and unsigned. Recent compilers have warnings for some of that, but it's not really something C can change at this point. Kinda like str/unicode in Python 2: you can convert between them so freely that they can look like the same thing. But they're not.

PyLong is strongly signed. If users want sign extension, they should use PyLong_AsNativeBytes, not the unsigned variant.

zooba commented 8 months ago

As long as the existing behaviour/test for -2147467259 doesn't change, then sure.

I'd like to see the new unsigned API be clearly marked as only working on positive input values, but I'm also happy to stay out of the way of it.