modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo/manual/
Other
23.35k stars 2.6k forks source link

[BUG] PythonObject overides Python dundermethods where PythonObject itself implements those methods #3648

Open MadAlex1997 opened 1 month ago

MadAlex1997 commented 1 month ago

Bug description

When using == __eq__ or != __ne__ on numpy arrays from mojo like the following.

 from python import Python
 def main(): 

    var np = Python.import_module("numpy") 

    var a = np.ones((3, 2, 4), dtype=np.int32) 

    var b = np.ones((3, 2, 4), dtype=np.int32) 

    print(a) 
    print(b) 
    print(a.__eq__(b))
    print(a.__ne__(b)) 
    _ = a
    _ = b

It prints the arrays and then false twice (if it doesn't segfault), whereas in native Python, the results are numpy arrays of boolean types. This means that PythonObject is calling its own dunder methods for these two checks, even though the Python code has valid dunder methods for this.

Steps to reproduce

Run the following

 from python import Python
 def main(): 

    var np = Python.import_module("numpy") 

    var a = np.ones((3, 2, 4), dtype=np.int32) 

    var b = np.ones((3, 2, 4), dtype=np.int32) 

    print(a) 
    print(b) 
    print(a.__eq__(b))
    print(a.__ne__(b)) 
    _ = a
    _ = b

System information

- Observed on Mac and WSL ubuntu 22.04.
- mojo 24.5.0 (e8aacb95)
- magic 0.2.3
mikowals commented 1 month ago

This means that PythonObject is calling its own dunder methods

Yes, it is overriding them but I think that is a design decision to allow PythonObject to conform to EqualityComparable. Not having that conformance means not being able to use PythonObjects as Dict keys( as done here ). The override of __eq__ is done here and specifies a Mojo Bool result rather than just returning a PythonObject.

Serving both the Dict key and the Numpy use cases automatically would be ideal but it also seems like a sharp edge where Mojo strict type system requires return type information from Python to allow duck typing of PythonObjects.

In the meantime if you want to get back a PythonObject when comparing Numpy objects you could use:

def main(): 

    var np = Python.import_module("numpy") 

    var a = np.ones((3, 2, 4), dtype=np.int32) 

    var b = np.ones((3, 2, 4), dtype=np.int32) 

    print(a) 
    print(b) 
    print(a.__getattr__("__eq__")(b))
    print(a.__getattr__("__ne__")(b)) 
    _ = a
    _ = b

I think maybe this issues should be renamed to "PythonObject.__eq__() should return a PythonObject rather than a Bool" since that is what I think you are asking for. And this specific issue mentioning Numpy probably needs to be in the sharp edges of the road map.

MadAlex1997 commented 1 month ago

@mikowals I already have other ways around it, I only use numpy in unit tests and you can use np.equals(a1,a2).

There are many cases where a custom dunder for equal (and other comparison ops) could be used common examples are speed (comparing hashes), and library types referring to an FFI pointer. It would seem that in the case of the last option, which is what is happening with NumPy and likely would happen with torch, TensorFlow, and others, represents a place where someone who is coming from Python and trying to use libraries that they know well would receive incorrect answers.

I can see how it would be a sharp edge to change it and break EqualityComparible on PythonObject, but I think it is currently a sharp edge regardless. Perhaps, some conditional behavior could be applied to objects from outside of the standard library.