Open iamrecursion opened 3 years ago
Hi @iamrecursion, thank you for the report. Yes, we have a bug - the chr
problem is actually not in interop, it in our implementation of chr
that doesn't accept long
for some reason. I'll fix it shortly.
The Text
issue is different. In Python, only str
and its subclasses are considered strings for most functions. There is a __str__
special method used to perform conversions to strings, which is used by functions like str
or print
that accept such string-convertibles, but the vast majority of functions just expect str
type. Interop values that respond to isString
behave like objects with __str__
conversion method defined, but they aren't considered subclasses of the str
type. Currently, you either need to convert the value to a Java string on the Java side, or use str
builtin to convert it to Python string on the Python side. In general, interop objects hook into the python special methods, like __index__
, __str__
, __len__
etc, but they don't become subclasses of Python builtin types. If this is problematic for you, could your describe your use case in more detail?
I wouldn't say that it's expressly problematic, but it provides a much worse UX than we'd ideally like.
The promise of interop as far as we're concerned is the seamless passing of data between languages; we treat all isString == true
interop objects as first class Text
in Enso and I must admit I expected that most languages would do so. The same is true of all types that the InteropLibrary
can talk about (arrays, numbers, exceptions, nulls, booleans, and so on). Given that the docs for InteropLibrary
states that it expects that the "target language embeds the interop protocol semantics as part of their existing language semantics", this seems like the correct approach.
Indeed, GraalJS and FastR both do this. This means that an Enso user can seamlessly hand off data between Enso and these languages without having to ever think about wrappers, calling special methods, and so on.
From a more philosophical perspective, it seems quite odd to me that GraalPython treats isString
as meaning that it defines __str__
(i.e. a toString
conversion). In Enso we interpreted toDisplayString
as meaning equivalent to toString
or __str__
, whilst isString == true
means that "this interop value is actually a text object". Similarly, we treat hasArrayElements == true
to mean "is an array object", isBoolean == true
to mean "is a boolean object" and so on. This is only our understanding of the InteropLibrary
protocol, but it's backed up by other truffle language implementations at least! Along similar lines, GraalPython doesn't treat isException == true
objects as first-class exceptions for try ... except ...
, unlike the other truffle languages!
I apologise that this is a touch rambling, but I hope that it helps clarify our thinking a bit!
That is indeed an interesting question, and one we've been considering for some time. However, there are few intricacies that are not clear to me, yet.
For example, say you ask type(obj)
where obj
is a foreign object that says true
to isString
, but also true to fitsInByte
or hasHashEntries
? Python bool
(for example) inherits from Python int
, so naturally these overlap. Python supports multiple inheritance, so should we manufacture "foreign" classes for all possible combinations of foreign objects with these kinds of overlaps? How should we resolve conflicts in the method definitions between those Python (super)types? Python code just does it via the order of base classes, but InteropLibrary does not offer such disambiguation. Furthermore, what about getMetaObject
? Shouldn't the Python code type(obj)
return the foreign meta object? How then can it also return str
or a subclass of str
, when InteropLibrary does not expose any inheritance information?
As for your example, toDisplayString
maps more to Python's __repr__
function. The contract for __str__
is precisely that objects that are "like" strings have this method to convert themselves into a proper str
(using asString
in the interop case). Similarly, the method __int__
is available exactly to declare that an object can be represented as a Python int
(using asLong
in the interop case).
All that being said, we do try to support all interop objects in Python builtins to do the right thing, it's just that I'm less sure about actually making them "instances of" the Python builtin types. So we really need to fix both of the issues you describe above (and many more) :-)
As an example, I mentioned that bool
inherits from int
in Python - what that leads us to do is to consider foreign objects that return true
for isBoolean
as integers 1
or 0
in arithmetic expressions, even if the foreign language might not allow booleans in such expressions. (And here, too, we probably have many places that are missing the proper specializations for this)
I think the most reliable way to deal with these questions is to make GraalPython's semantics a "superset" of cpython's. Given that cpython hasn't got any idea about interop I would imagine that GraalPython can basically do whatever is "most sensible". That's not to discount the difficulty of this problem, however. In Enso the type system works very differently which makes it much simpler for us to work with our native types as first-class interop types.
For example, say you ask type(obj) where obj is a foreign object that says true to isString, but also true to fitsInByte or hasHashEntries?
I don't know enough about the Python object model to really comment on this kind of thing. The way we deal with it in Enso is that any primitive operation that expects a X is made to work with any type for which interop isX
returns true. As far as "fitting them into" the type system they're effectively treated like instances of Polyglot a
where a
is a list of behaviours that they can perform (e.g. Polyglot [As_Boolean, As_Integer, As_Array]
). It's relatively flexible, but not easy to formalise statically so it's all handled dynamically at the moment.
All that being said, we do try to support all interop objects in Python builtins to do the right thing, it's just that I'm less sure about actually making them "instances of" the Python builtin types. So we really need to fix both of the issues you describe above (and many more) :-)
Perhaps a similar approach to ours would work for GraalPython? Rather than considering making polyglot objects actual subtypes of the relevant python types, why not instead extend the language's semantics to (where relevant) treat polyglot objects defining certain behaviours as equivalent to the builtin types. That's a fairly imprecise description of the behaviour, but I think it's what makes most sense from the perspective of a user.
As for your example, toDisplayString maps more to Python's repr function. The contract for str is precisely that objects that are "like" strings have this method to convert themselves into a proper str (using asString in the interop case). Similarly, the method int is available exactly to declare that an object can be represented as a Python int (using asLong in the interop case).
My mistake! Sorry for the confusion. In that case I'd expect an object that returns true
from isString
to have asString
called on it before being handed to the relevant parts of the GraalPython codebase. I don't think interop expects that calling asX
is a lossless operation, so this would definitely make the most sense to me. As far as I understand it (and please correct me if I'm wrong), this would only be an extension of Python's semantics, rather than a breaking change.
why not instead extend the language's semantics to (where relevant) treat polyglot objects defining certain behaviours as equivalent to the builtin types
That's what we already do in parts - most built-in functions in Python do not check the type, they call appropriate conversion function. In the places where they don't, we have to handle interop objects directly. This is not a very systematical approach, that's why we're missing that handling in a number of places :)
That makes perfect sense! Thanks for the detailed discussion and I look forward to any improvements in this area!
The
InteropLibrary
defines the notion of interop types for various truffle languages to communicate. At Enso, we've integrated with GraalPython but run into a few bugs because it seems to make assumptions about native type representations of basic primitives such as numbers and strings.The following program handily reproduces the issue, using only the truffle API.
We see a similar effect with our
Text
implementation, which conforms to theisString
portion of the protocol, when callingord(...)
on it.Tested on
graalpython-21.0.0.2
.