oracle / fastr

A high-performance implementation of the R programming language, built on GraalVM.
Other
624 stars 64 forks source link

Can't assign class to polyglot value (UnsupportedSpecializationException) #124

Closed querenker closed 4 years ago

querenker commented 4 years ago

If I want to assign a class to a polyglot value (in this case a Pandas data frame), I get the following error:

❯ graalpython --polyglot --R.PrintErrorStacktracesToFile=true                                                                                                                   
Python 3.7.4 (Tue Oct 29 19:43:29 CET 2019)
[GraalVM CE, Java 1.8.0_232] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Please note: This Python implementation is in the very early stages, and can run little more than basic benchmarks at this point.
>>> import polyglot
>>> import pandas as pd
>>> df = pd.read_csv('test.csv')
>>> polyglot.export_value('df', df)
   x  y
0 12 42
1 15 46
2  8 35
>>> polyglot.eval(language='R', string="df <- import('df')")
   x  y
0 12 42
1 15 46
2  8 35
>>> polyglot.eval(language='R', string="class(df) <- 'test'")
An internal error occurred: "UnsupportedSpecializationException"
Please report an issue at https://github.com/oracle/fastr including the commands and the error log file '/Users/alexander/repositories/polyglot_ds/fastr_errors_pid48983.log'.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module '<expression>'>
NotImplementedError: Unexpected values provided for class<-: [<DataFrame object at 0x3b7e712d>, test], [PythonObject,String]

Interestingly, it works for simpler Python objects like lists:

>>> import polyglot
>>> polyglot.export_value('l', [1, 2, 3])
[1, 2, 3]
>>> polyglot.eval(language='R', string="l <- import('l')")
[1, 2, 3]
>>> polyglot.eval(language='R', string="class(l) <- 'test'")
test
>>> polyglot.eval(language='R', string="class(l)")
test
>>>

Log File

Thu Nov 28 10:56:13 CET 2019
com.oracle.truffle.api.dsl.UnsupportedSpecializationException: Unexpected values provided for class<-: [<DataFrame object at 0x3b7e712d>, test], [PythonObject,String]
    at com.oracle.truffle.r.nodes.builtin.base.UpdateClassNodeGen.executeAndSpecialize(UpdateClassNodeGen.java:354)
    at com.oracle.truffle.r.nodes.builtin.base.UpdateClassNodeGen.execute(UpdateClassNodeGen.java:147)
    at com.oracle.truffle.r.nodes.builtin.RBuiltinNode$Arg2.call(RBuiltinNode.java:187)
    at com.oracle.truffle.r.nodes.function.RCallNode$BuiltinCallNode.execute(RCallNode.java:1149)
    at com.oracle.truffle.r.nodes.function.RCallNode$FunctionDispatch.dispatch(RCallNode.java:911)
    at com.oracle.truffle.r.nodes.function.RCallNodeGen$FunctionDispatchNodeGen.executeAndSpecialize(RCallNodeGen.java:905)
    at com.oracle.truffle.r.nodes.function.RCallNodeGen$FunctionDispatchNodeGen.execute(RCallNodeGen.java:869)
    at com.oracle.truffle.r.nodes.function.RCallNode.call(RCallNode.java:289)
    at com.oracle.truffle.r.nodes.function.RCallNodeGen.executeAndSpecialize(RCallNodeGen.java:246)
    at com.oracle.truffle.r.nodes.function.RCallNodeGen.execute(RCallNodeGen.java:220)
    at com.oracle.truffle.r.nodes.access.WriteLocalFrameVariableNodeGen.execute_generic3(WriteLocalFrameVariableNodeGen.java:115)
    at com.oracle.truffle.r.nodes.access.WriteLocalFrameVariableNodeGen.execute(WriteLocalFrameVariableNodeGen.java:49)
    at com.oracle.truffle.r.nodes.control.ReplacementNode$GenericReplacementNode.executeReplacement(ReplacementNode.java:452)
    at com.oracle.truffle.r.nodes.control.ReplacementNode$ReplacementWithRhsNode.execute(ReplacementNode.java:219)
    at com.oracle.truffle.r.runtime.nodes.RNode.visibleExecute(RNode.java:74)
    at com.oracle.truffle.r.nodes.control.ReplacementDispatchNode.visibleExecute(ReplacementDispatchNode.java:92)
    at com.oracle.truffle.r.engine.REngine$AnonymousBodyNode.visibleExecute(REngine.java:634)
    at com.oracle.truffle.r.engine.REngine$AnonymousRootNode.execute(REngine.java:561)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:348)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:338)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:325)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:307)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callDirect(OptimizedCallTarget.java:256)
    at org.graalvm.compiler.truffle.runtime.OptimizedDirectCallNode.call(OptimizedDirectCallNode.java:64)
    at com.oracle.truffle.r.engine.EngineRootNode$EngineBodyNode.execute(EngineRootNode.java:138)
    at com.oracle.truffle.r.engine.EngineRootNode.execute(EngineRootNode.java:85)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:348)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:338)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:325)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:307)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callIndirect(OptimizedCallTarget.java:244)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:231)
    at com.oracle.graal.python.builtins.modules.PolyglotModuleBuiltins$EvalInteropNode.evalString(PolyglotModuleBuiltins.java:158)
    at com.oracle.graal.python.builtins.modules.PolyglotModuleBuiltinsFactory$EvalInteropNodeFactory$EvalInteropNodeGen.execute(PolyglotModuleBuiltinsFactory.java:222)
    at com.oracle.graal.python.nodes.function.BuiltinFunctionRootNode$BuiltinAnyCallNode.execute(BuiltinFunctionRootNode.java:76)
    at com.oracle.graal.python.nodes.function.BuiltinFunctionRootNode.execute(BuiltinFunctionRootNode.java:353)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:348)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:338)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:325)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:307)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callDirect(OptimizedCallTarget.java:256)
    at org.graalvm.compiler.truffle.runtime.OptimizedDirectCallNode.call(OptimizedDirectCallNode.java:64)
    at com.oracle.graal.python.nodes.call.InvokeNode.doDirect(InvokeNode.java:204)
    at com.oracle.graal.python.nodes.call.InvokeNodeGen.execute(InvokeNodeGen.java:21)
    at com.oracle.graal.python.nodes.call.CallDispatchNode.callBuiltinFunctionCached(CallDispatchNode.java:110)
    at com.oracle.graal.python.nodes.call.CallDispatchNodeGen.executeAndSpecialize(CallDispatchNodeGen.java:282)
    at com.oracle.graal.python.nodes.call.CallDispatchNodeGen.executeCall(CallDispatchNodeGen.java:76)
    at com.oracle.graal.python.nodes.call.CallNode$CachedCallNode.builtinMethodCallBuiltinDirectCached(CallNode.java:151)
    at com.oracle.graal.python.nodes.call.CallNodeFactory$CachedCallNodeGen.executeAndSpecialize(CallNodeFactory.java:212)
    at com.oracle.graal.python.nodes.call.CallNodeFactory$CachedCallNodeGen.execute(CallNodeFactory.java:128)
    at com.oracle.graal.python.nodes.call.PythonCallNode.call(PythonCallNode.java:283)
    at com.oracle.graal.python.nodes.call.PythonCallNodeGen.executeAndSpecialize(PythonCallNodeGen.java:91)
    at com.oracle.graal.python.nodes.call.PythonCallNodeGen.execute(PythonCallNodeGen.java:63)
    at com.oracle.graal.python.nodes.statement.PrintExpressionNode.execute(PrintExpressionNode.java:69)
    at com.oracle.graal.python.nodes.function.InnerRootNode.execute(InnerRootNode.java:66)
    at com.oracle.graal.python.nodes.ModuleRootNode.execute(ModuleRootNode.java:87)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:348)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:338)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:325)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:307)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callIndirect(OptimizedCallTarget.java:244)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:231)
    at com.oracle.graal.python.nodes.control.TopLevelExceptionHandler.run(TopLevelExceptionHandler.java:285)
    at com.oracle.graal.python.nodes.control.TopLevelExceptionHandler.execute(TopLevelExceptionHandler.java:132)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callProxy(OptimizedCallTarget.java:348)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callRoot(OptimizedCallTarget.java:338)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:325)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:307)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callIndirect(OptimizedCallTarget.java:244)
    at org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.call(OptimizedCallTarget.java:231)
    at com.oracle.truffle.polyglot.PolyglotContextImpl.eval(PolyglotContextImpl.java:820)
    at org.graalvm.polyglot.Context.eval(Context.java:344)
    at com.oracle.graal.python.shell.GraalPythonMain.readEvalPrint(GraalPythonMain.java:634)
    at com.oracle.graal.python.shell.GraalPythonMain.launch(GraalPythonMain.java:421)
    at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:121)
    at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:70)
    at com.oracle.graal.python.shell.GraalPythonMain.main(GraalPythonMain.java:59)
Caused by: Attached Guest Language Frames (5)
tstupka commented 4 years ago

When a polyglot object which is an array is passed to a FastR builtin then it is automatically converted (or handled) as an native R vector. In your example that would be the simple Python object (polyglot.export_value('l', [1, 2, 3]), but not the Pandas data frame.

the same problem can be reproduced with java interop from the R console

> ja <- new('int[]', 2) # creates a java int array polyglot object
> sum(ja)
[1] 0
> matrix(ja)
     [,1]
[1,]    0
[2,]    0
> class(ja) <- 'test'

vs

> calendar <- new('java.util.GregorianCalendar')
> sum(calendar)
Error in sum(calendar) : invalid 'type' (polyglot.value) of argument
> class(calendar) <- 'test'
An internal error occurred: "UnsupportedSpecializationException"

Nevertheless, instead of the UnsupportedSpecializationException we should provide a proper error msg in the update class builtin.

for more information about interop in R see also the Fastr Java Interoperability doc and also the Rules for lazy conversion of foreign objects to R objects

thanks

querenker commented 4 years ago

Thank you for your insightful answer.

I am wondering, if it is intended that classes can not be assigned to polyglot values. I am new to R, but as far as I understand, (nearly?) every object in R can get an class attribute (even data types like integers, which are considered in many languages as primitive data type)

steve-s commented 4 years ago

I am wondering, if it is intended that classes can not be assigned to polyglot values. I am new to R, but as far as I understand, (nearly?) every object in R can get an class attribute (even data types like integers, which are considered in many languages as primitive data type)

the issue here is that, ideally, foreign objects should be able to "flow" through R as-is, for example:

function(a, one, other) {
   return(if (a == 1L) one else other) 
}

this function can be called from another language, like Python, and whatever was in one or two should be returned back. If we allow to add attributes (like class) to foreign objects, what should we return to another language? Returning the foreign object w/o the attributes may be confusing and returning some wrapper with the attributes wouldn't keep this "identity" property. Moreover, returning the object w/o attributes wouldn't keep the "identity" property the other way around: if it finds its way back to R, it suddenly wouldn't have the attributes.

Currently, our philosophy is based on

The issue here is that we don't have any implicit conversion your foreign object (Pandas data frame). If you used an ordinary Python list, for examle, the class<- builtin would convert it to a vector and then assign the class attribute to it and would return that as the result (Note: class(x) <- "abc' is syntax sugar for x <- `class<-`(x, "abc")).

We should eventually add a support for either implicit and/or explicit conversion of Pandas DFs to R DFs. So far I think the implicit conversion can be bit cleaner: as.data.frame(my_pandas_df). Maybe it can already be implemented in user code with Java interop and proxy objects like in this example where a "data frame" like Java object is exposed to FastR as an R data frame.

querenker commented 4 years ago

Thank you very much for your detailed explanation 👍 Now I understand why it is handled that way and will close this issue.