oracle / graalpython

GraalPy – A high-performance embeddable Python 3 runtime for Java
https://www.graalvm.org/python/
Other
1.24k stars 108 forks source link

Interop Types Not Detected Properly by GraalPython #205

Open iamrecursion opened 3 years ago

iamrecursion commented 3 years ago

The InteropLibrary defines the notion of interop types for various truffle languages to communicate. At Enso, we've integrated with GraalPython but run into a few bugs because it seems to make assumptions about native type representations of basic primitives such as numbers and strings.

The following program handily reproduces the issue, using only the truffle API.

import org.graalvm.polyglot.Context;
import org.graalvm.polyglot.Value;

public class PythonTypesBug {
  public static void main(String[] args) {
    var ctx = Context.newBuilder().allowAllAccess(true).build();

    Value chrFunc = ctx.eval("python", "lambda a: chr(a)");
    Value typeFunc = ctx.eval("python", "lambda a: type(a)");

    long longInput = 67;
    int intInput = 67;

    // Both of the inputs have the same type when we ask python what it is.
    if (typeFunc.canExecute()) {
      System.out.println("The type of 67L is " + typeFunc.execute(longInput));
      System.out.println("The type of 67 is " + typeFunc.execute(intInput));
    }

    // But GraalPython seems to make some strange assumptions.
    if (chrFunc.canExecute()) {
      System.out.println("Executing on an integer gives the result: " + chrFunc.execute(intInput));
      System.out.println(
          "But executing on a long causes the following crash, despite the above types being the same");

      chrFunc.execute(longInput);
    }
  }
}

We see a similar effect with our Text implementation, which conforms to the isString portion of the protocol, when calling ord(...) on it.

Tested on graalpython-21.0.0.2.

msimacek commented 3 years ago

Hi @iamrecursion, thank you for the report. Yes, we have a bug - the chr problem is actually not in interop, it in our implementation of chr that doesn't accept long for some reason. I'll fix it shortly.

The Text issue is different. In Python, only str and its subclasses are considered strings for most functions. There is a __str__ special method used to perform conversions to strings, which is used by functions like str or print that accept such string-convertibles, but the vast majority of functions just expect str type. Interop values that respond to isString behave like objects with __str__ conversion method defined, but they aren't considered subclasses of the str type. Currently, you either need to convert the value to a Java string on the Java side, or use str builtin to convert it to Python string on the Python side. In general, interop objects hook into the python special methods, like __index__, __str__, __len__ etc, but they don't become subclasses of Python builtin types. If this is problematic for you, could your describe your use case in more detail?

iamrecursion commented 3 years ago

I wouldn't say that it's expressly problematic, but it provides a much worse UX than we'd ideally like.

The promise of interop as far as we're concerned is the seamless passing of data between languages; we treat all isString == true interop objects as first class Text in Enso and I must admit I expected that most languages would do so. The same is true of all types that the InteropLibrary can talk about (arrays, numbers, exceptions, nulls, booleans, and so on). Given that the docs for InteropLibrary states that it expects that the "target language embeds the interop protocol semantics as part of their existing language semantics", this seems like the correct approach.

Indeed, GraalJS and FastR both do this. This means that an Enso user can seamlessly hand off data between Enso and these languages without having to ever think about wrappers, calling special methods, and so on.

From a more philosophical perspective, it seems quite odd to me that GraalPython treats isString as meaning that it defines __str__ (i.e. a toString conversion). In Enso we interpreted toDisplayString as meaning equivalent to toString or __str__, whilst isString == true means that "this interop value is actually a text object". Similarly, we treat hasArrayElements == true to mean "is an array object", isBoolean == true to mean "is a boolean object" and so on. This is only our understanding of the InteropLibrary protocol, but it's backed up by other truffle language implementations at least! Along similar lines, GraalPython doesn't treat isException == true objects as first-class exceptions for try ... except ..., unlike the other truffle languages!

I apologise that this is a touch rambling, but I hope that it helps clarify our thinking a bit!

timfel commented 3 years ago

That is indeed an interesting question, and one we've been considering for some time. However, there are few intricacies that are not clear to me, yet.

For example, say you ask type(obj) where obj is a foreign object that says true to isString, but also true to fitsInByte or hasHashEntries? Python bool (for example) inherits from Python int, so naturally these overlap. Python supports multiple inheritance, so should we manufacture "foreign" classes for all possible combinations of foreign objects with these kinds of overlaps? How should we resolve conflicts in the method definitions between those Python (super)types? Python code just does it via the order of base classes, but InteropLibrary does not offer such disambiguation. Furthermore, what about getMetaObject? Shouldn't the Python code type(obj) return the foreign meta object? How then can it also return str or a subclass of str, when InteropLibrary does not expose any inheritance information?

As for your example, toDisplayString maps more to Python's __repr__ function. The contract for __str__ is precisely that objects that are "like" strings have this method to convert themselves into a proper str (using asString in the interop case). Similarly, the method __int__ is available exactly to declare that an object can be represented as a Python int (using asLong in the interop case).

timfel commented 3 years ago

All that being said, we do try to support all interop objects in Python builtins to do the right thing, it's just that I'm less sure about actually making them "instances of" the Python builtin types. So we really need to fix both of the issues you describe above (and many more) :-)

As an example, I mentioned that bool inherits from int in Python - what that leads us to do is to consider foreign objects that return true for isBoolean as integers 1 or 0 in arithmetic expressions, even if the foreign language might not allow booleans in such expressions. (And here, too, we probably have many places that are missing the proper specializations for this)

iamrecursion commented 3 years ago

I think the most reliable way to deal with these questions is to make GraalPython's semantics a "superset" of cpython's. Given that cpython hasn't got any idea about interop I would imagine that GraalPython can basically do whatever is "most sensible". That's not to discount the difficulty of this problem, however. In Enso the type system works very differently which makes it much simpler for us to work with our native types as first-class interop types.

For example, say you ask type(obj) where obj is a foreign object that says true to isString, but also true to fitsInByte or hasHashEntries?

I don't know enough about the Python object model to really comment on this kind of thing. The way we deal with it in Enso is that any primitive operation that expects a X is made to work with any type for which interop isX returns true. As far as "fitting them into" the type system they're effectively treated like instances of Polyglot a where a is a list of behaviours that they can perform (e.g. Polyglot [As_Boolean, As_Integer, As_Array]). It's relatively flexible, but not easy to formalise statically so it's all handled dynamically at the moment.

All that being said, we do try to support all interop objects in Python builtins to do the right thing, it's just that I'm less sure about actually making them "instances of" the Python builtin types. So we really need to fix both of the issues you describe above (and many more) :-)

Perhaps a similar approach to ours would work for GraalPython? Rather than considering making polyglot objects actual subtypes of the relevant python types, why not instead extend the language's semantics to (where relevant) treat polyglot objects defining certain behaviours as equivalent to the builtin types. That's a fairly imprecise description of the behaviour, but I think it's what makes most sense from the perspective of a user.

As for your example, toDisplayString maps more to Python's repr function. The contract for str is precisely that objects that are "like" strings have this method to convert themselves into a proper str (using asString in the interop case). Similarly, the method int is available exactly to declare that an object can be represented as a Python int (using asLong in the interop case).

My mistake! Sorry for the confusion. In that case I'd expect an object that returns true from isString to have asString called on it before being handed to the relevant parts of the GraalPython codebase. I don't think interop expects that calling asX is a lossless operation, so this would definitely make the most sense to me. As far as I understand it (and please correct me if I'm wrong), this would only be an extension of Python's semantics, rather than a breaking change.

timfel commented 3 years ago

why not instead extend the language's semantics to (where relevant) treat polyglot objects defining certain behaviours as equivalent to the builtin types

That's what we already do in parts - most built-in functions in Python do not check the type, they call appropriate conversion function. In the places where they don't, we have to handle interop objects directly. This is not a very systematical approach, that's why we're missing that handling in a number of places :)

iamrecursion commented 3 years ago

That makes perfect sense! Thanks for the detailed discussion and I look forward to any improvements in this area!