micropython / micropython

MicroPython - a lean and efficient Python implementation for microcontrollers and constrained systems
https://micropython.org
Other
19.23k stars 7.7k forks source link

Do we need mp_obj_int_get? #779

Closed dpgeorge closed 9 years ago

dpgeorge commented 10 years ago

To convert an object to a number, one should use mp_obj_get_int, which raises an exception if the value doesn't fit in a machine word (it also checks if the object is a number, or boolean, etc).

mp_obj_int_get is a shortcut that doesn't check the type of the argument (assumes it's an int), and is supposed to truncate the result to fit in a machine word. Is this function necessary?

@pfalcon, I ask you specifically because this function is only used in modffi, modsocket (unix) and moductypes. In these cases it can probably be replaced with either mp_obj_get_int, or mp_obj_get_uint (a function that needs to be added to support full machine word range for, eg, address pointers in moductypes).

pfalcon commented 10 years ago

Well, mp_obj_int_get() and mp_obj_get_int() are on different API levels. The former is "objint" type accessor (cf. mp_obj_list_get() ). The latter is runtime function which coerces an arbitrary Python object to machine int.

Do we need accessor functions for a particular type? Answer apparently depends on whether we need accessors for various types at all.

mp_obj_int_get() specifically used: when object type is checked previously, and it's known that it's an int type. As an exception, it's also used in performance-critical and unsafe code, like ffi or ctypes. This last usage is not ideal, but the idea is that those modules already offer good ways to crash system on non-careful usage, so following GIGO principle there for extra performance bits makes sense.

dpgeorge commented 10 years ago

Actually, the main thing I'm interested in is what the actual specs of mp_obj_int_get are. What if the number doesn't fit in a machine word? Raise an exception, or silently truncate the number? For the latter, you then need to specify the truncation method, probably 2's complement.

mp_obj_int_get() specifically used: when object type is checked previously, and it's known that it's an int type.

In this case you probably should be using mp_obj_int_get_checked which checks for overflow (because that's what mp_obj_get_int calls when it knows the object is of type int).

pfalcon commented 10 years ago

Actually, the main thing I'm interested in is what the actual specs of mp_obj_int_get are. What if the number doesn't fit in a machine word? Raise an exception, or silently truncate the number? For the latter, you then need to specify the truncation method, probably 2's complement.

There's mp_obj_int_get_checked() which checks for overflow and raises exception (except that at least long long impl doesn't have checks implemented). mp_obj_int_get() apparently behaves the same way as C's "(int)" cast.

In this case you probably should be using mp_obj_int_get_checked which checks for overflow (because that's what mp_obj_get_int calls when it knows the object is of type int).

As I mentioned, I wanted to squeeze all cycles possible from FFI/FDI, bu avoiding extra checks, and gave argument why I think that's not complete insane - generally, calls to C funcs and access to C data behaves as in C itself - for example, accessing long long by int* will get truncated result, and accessing float will return garbage.

If you really think there're good reasons to change this schedule, feel free to. (Otherwise, now that we have uctypes, I have TODO to elaborate modffi a bit, not sure I would touch this issue though).

dpgeorge commented 10 years ago

Currently, the checked/unchecked versions of the conversion functions are not properly implemented for longlong and mpz, and I wanted to see what we need them for (if they are needed) before implementing them. Eg, implementing mpz_as_int_unchecked would add some very rarely used code.

I agree it can be useful to have an unchecked version of the convertor. But I don't think it's for speed reasons that you used it in line 220 of modsocket.c.

pfalcon commented 10 years ago

Well, if you think we should merge checked and unchecked versions into single checked function, maybe that makes sense.

dpgeorge commented 10 years ago

Let's focus on the question: do we need both mp_obj_int_get and mp_obj_int_get_checked?

Most of the time, mpz is enabled, so let's discuss that case. Then, if your argument is a small int, both the functions (checked and not checked) have the same cost and are functionally equivalent. If your argument is a bignum (an mpz) then both functions take a bit of time to do the conversion. If we wrote a special non-checked mpz convertor, that could be quicker (ie stop converting and return when it starts to overflow), but is it worth it for this small case?

The real question is whether you want/need to allow overflow on the conversion.

pfalcon commented 10 years ago

When I introduced those 2 accessor versions, I thought about case of not wanting to check for overflow, i.e. when too big integer wouldn't be reasonably passed in. It was also coupled with MICROPY_LONGINT_IMPL_LONGLONG , which is simple and low-overhead, so there was reasonable difference between checking and not checking for overflow.

I still think that long long long int impl is viable choice for size-optimized builds. Actually, arbitrary-precision implementation is of same utility as unicode: it's cool to have for language completeness, but you need to go a long way to find a case when it's really needed.

So, with mpz in mind, there's indeed not a big difference in having unchecked vs checked accessors, but for long long for me it feels a bit different, but you looking at it from a different angle may see it more clearly.

dpgeorge commented 10 years ago

Yes, long long is useful for cases where you need full 32 bits range, but don't want mpz.

I'd be happy to keep mp_obj_int_get if it were renamed mp_obj_int_get_truncated, and was defined to take either a small int or a bignum, and return val(obj) & 0xffffffff (where 0xffffffff is the machine word size mask).

dpgeorge commented 9 years ago

All accessor functions have been kept, and mp_obj_int_get renamed to mp_obj_int_get_truncated to reflect its behaviour.