Open JaroslavTulach opened 10 months ago
Numpy defines __int__
and __float__
on numeric type - can a presence of these methods be treated as a sign that the Python object represents a number?
Yes. I think we didn't do that because we can't tell in advance if the integer number fits into any of the fixed size type. Now that interop supports BigInteger, we could report fitsInBigInteger
iff the object has __index__
or __int__
and fitsInDouble
iff it has __float__
. @timfel any concerns?
Note that numpy integers would still always report fitstInInt
false, even if it's numpy.int32
. Unfortunately numpy integers are not a subclass of python integers, so we have no way to tell if the number fits in 32 bits without calling the conversion method and looking at the result. Which we cannot do because the interop API forbids side-effects.
Could we assume/expect __int__
doesn't have side effects? It seems pretty bad to return the wrong answer for fitsInInt(). E.g. some int / numpy.int32 might give 0 because it says it doesn't fit an int, depending on how / is defined.
CC @chumer
Correction: for integers, we should only consider __index__
, not __int__
. That's how python internally converts objects for builtins that expect integers. __int__
is permitted to be a lossy conversion (so it also converts floats etc) and is basically only used by the int
builtin.
Could we assume/expect
__int__
doesn't have side effects?
Not sure. In most normal libraries I wouldn't expect visible side-effects, but there's no rule saying that there can't be any. For example unittest.mock.MagicMock
records all calls to special methods, including __index__
. Also it could degrade performance if the interop client calls those methods all the time expecting them to be cheap, i.e. a debugger trying them on everything it sees. For numpy ints it would be downcall to C and that's not cheap at all.
@eregon how does ruby approach this?
For purposes of Enso we just need to know the object is a number. Then we'll treat the object as number and not as a generic object with members/hash entries. How big/precise the number is then not important. We support long
, double
and BigInteger
and one of them will match (I hope).
@eregon how does ruby approach this?
We don't have a way to override fitsInInt() in Ruby currently, but we have similar things for array and hash predicates such as isArrayElementReadable()
which is also "Invoking this message does not cause any observable side-effects".
A user can define polyglot_array_element_readable?
on any Ruby class/object and that will get called on isArrayElementReadable()
. So then it is the responsibility of the one implementing polyglot_array_element_readable?
to avoid side effects.
You could do the same, e.g. polyglot_fits_in_int
or so, which you would either define for relevant numpy numbers (seems easier, could be extra methods defined when loading numpy) or treat numpy numbers specially in fitsInInt() (that seems harder). It doesn't need to be a method but could be some (possibly fully internal) attribute or whatever on the class probably (but that's probably not feasible e.g. for 64-bit ints which might or not fit in int
, that needs a method or similar).
In general I guess __index__
is already expected to have no observable side effects in regular Python, so then it might just be fine to call it.
treat
numpy
numbers specially infitsInInt()
Yes, having a special "numpy integration module" (and other integration modules for other popular Python libraries) would deliver more reliable results faster than trying to extract a general purpose solution based on random coding patterns that appear in such libraries and are often considered fragile ("can it side-effect or not?") to rely on.
Just my 2 Kč from a user of GraalVM that seeks solutions and ignores elegance.
Actually, asBigInteger
cannot have side-effects either :man_facepalming:
I've been thinking about the comments here and I've come up with a proposal:
We could have a mechanism to register custom interop behvior for arbitrary types. Most interop message calls would first lookup the type in the registry and if a method is found it would be called with the object instance. It would be up to the method implementor to ensure that interop contracts, like having no visible side-effects, are met. It's quite similar to what @eregon mentioned for ruby, except in python you cannot directly add methods or anything else to existing C-defined types, so we need a different mechanism of binding the method to the type.
Example of how it could work for numpy.int32
:
import polyglot
import numpy
@polyglot.register_polyglot_behavior(numpy.int32)
class NumpyInt32Behavior:
@staticmethod
def fits_in_int(obj):
return True
@staticmethod
def as_int(obj):
return obj.__index__()
...
Passing in the instance would let you do additional checks that cannot be infered based on the type, i.e. to implement fits_in_int
on numpy.int64
where you would have to check the bounds. The return types would always be python builtins. We would also likely remove the current fuzzy behavior of looking up special methods to guess if an object is an array etc. So this would also address #353, pandas dataframes wouldn't be arrays nor hashes until you define they are.
We would ship our own definitions for standard library types like datetime
. We could probably ship definitions for numpy
, since it's so extremely common, but we probably wouldn't want to provide definitions for other libraries, that's a slippery slope and creates problems with versioning. That would be up to you, users like you could publish such definitions for other libraries as packages on PyPI for others to reuse.
The tricky part is - when exactly would the definitions would get loaded? In the example, I have a dependency on numpy
, so graalpy cannot load it eagerly on startup. The simplest approach would be to require the users to import it explicitly. Is that something that would work for you? Either your users literally do import numpy_graalpy_interop
after they do import numpy
or you do it for them somewhere if you know they have numpy installed.
The tricky part is - when exactly would the definitions would get loaded?
I'd like to get a callback when certain module (like numpy
) gets loaded. Then I would load my numpy
(or any other) registrations.
Python's import system is very customizable, it already has extension points to do that. There's a package importhook that gives you a friendly interface to do exactly what you described.
Please let me know when the system is ready for experimenting. Thank you.
Hi @JaroslavTulach! We have just merged an api that allows you to specify the interop behavior for external types. It consists of the polyglog.register_interop_behavior
function and polyglot.get_registered_interop_behavior
(for introspection). The API is easy to use and is documented under docs/user/Interoperability.md.
Usage is rather simple: pass in the type as the first argument and pure python functions as kwargs for the messages you want to redefine. Almost all interop messages are supported (with some exceptions), and the naming convention for the kw arg is snake_casing i.e., isNumber
becomes is_number
etc.
import polyglot
class MyType(object):
data = 10
polyglot.register_interop_behavior(MyType, is_string=True, as_string=lambda t: f"MyType({t.data})")
We plan on providing default behavior extensions for numpy
and pandas
rather soon so stay tuned :) !
Thanks a lot @cosminbasca for letting me know. What's the best way to test latest Python bits? We are using following Maven co-ordinates to download the GraalPy packages.
Are there any co-ordinates to download latest snapshots?
numpy
defines its own numeric types. However GraalPy doesn't report them asInteropLibrary.isNumber
. That causes interoperability confusion. To reproduce store following file asNumpy.java
:and execute following commands:
this is causing us problems when sorting as the content of NumPy array isn't recognized as numbers, but treated as complex objects with members.