oracle / fastr

A high-performance implementation of the R programming language, built on GraalVM.
Other
624 stars 64 forks source link

polyglot interaction [potential bug] #114

Open davidpham87 opened 4 years ago

davidpham87 commented 4 years ago

Hello,

How are R matrix represented in polyglot? For example the lm object in R yields several matrices, but once converted back to Java, these appear to be vector. Is this expected?

In the following excerpt, eval-r is simply a wrapper around context (see my project here).

(ns exo.bug
  (:require
   [r-interop.core :as rc]))

(println (.toString (rc/eval-r "matrix(1:4, 2)")))
;; Outputs
;;       [,1] [,2]
;; [1,]    1    3
;; [2,]    2    4
(rc/value->clj (rc/eval-r "matrix(1:4, 2)")) ;; => [1 2 3 4]
(.getArraySize (rc/eval-r "matrix(1:4, 2)")) ;; => 4

Moreover, I have encountered several occasion where the methods .hasMemberKey from polyglot.Value return true, and memberKeys is an empty set.

Finally, I have a simple question: I can't manage to make a round trip between R and Java and persisting the R class. For example using the lm R object could be represented a hash-map in Java/Clojure, but whenever I convert it back as proxy-object, I would not get the same object. Is there a solution to this problem? I tried to set the class of the Java object, but did not get much success.

steve-s commented 4 years ago

Hello,

thank you for you interest in FastR! Yes, R matrices are presented as flat arrays to Java. That is what they are in R as well -- regular flat vectors that only have additional attribute with "dimensions". You can use R functions to query the dimensions, in Java it would be something along these lines:

Value val = context.eval("R", "matrix(1:12, nrow=3)")
Value valDims = context.eval("R", "dims").execute(val);
int rows = valDims.getArrayElement(0);
int cols = valDims.getArrayElement(1);

Presenting R matrices and R arrays as multidimensional arrays to Java and other languages could make sense, but it would be a breaking change...

Could you share the example where hasMemberKey returns true, but memberKeys is empty?

For your last question: could you share some small example? Things like this should work:

Value val = context.eval("R", "structure(1:3, class='myclass')");
Value getClass = context.eval("R", "class");
System.out.println(getClass.execute(val).asString());
davidpham87 commented 4 years ago

Thanks to you for working on that really interesting project!

My goal was to write some facilities write some functions to have a full interop between Clojure and R. As such, I was always converting all the results from R into Clojure types. As such, I was converting aggressively all R object into pure Clojure maps. Probably I will have to switch strategy and live polyglot objects.

However, I find this design choice less clean than having pure clojure maps that can be converted back to R: in the matrix example above, the real information in R is a matrix, and in pure Java/Clojure, it would be a representation of a matrix (e.g. arrays of arrays). The inconvenience with manipulating polyglot values is that they are alien to the languages and the functions in the languages can not be used to manipulate the information. To illustrate this point, let's suppose that in the matrix example above, the matrix belongs to a S3 class. How can we modify the matrix on JVM and send the result back to R, and R sees it a the same S3 class? If Clojure might sound an exotic language, I still think the problem will persist with the other targeted languages such as python and javascript.

I might also note that usually R object are meant to be read-only, so keeping the R object as a Polyglot object is a reasonable solution for most use-cases, but what about the objects that are meant to be modified?

I can't retrieve the exact bug for members, but I removed my internal check, so I should see the behavior fairly fast if it persists. I also remember sometimes the polyglot value have both array size > 0 and also members which was not intuitive and forced me to check for members for first.

For your last example, can you build a lm object from Java (or create in R, convert it pure Java) use summary.lm or plot on it?

In clojure it would be something like

(require '[r-interop.core :as rc])
(require '[r-interop.packages.stats :as st])

(def iris-lm (rc/eval-r "lm(Petal.Width ~ Sepal.Length + Species, data=iris)"))
;; as this point iris-lm is a pure clojure hashmap

(def set-class
  (rc/reify-ifn-polyglot (rc/eval-r "function(m, cl) {
  m <- as.list(m)
  class(m) <- cl
  m}")))

(def y-lm (set-class (rc/->proxy-object y) "lm"))
(st/summary-lm y-lm)
steve-s commented 4 years ago

The inconvenience with manipulating polyglot values is that they are alien to the languages and the functions in the languages can not be used to manipulate the information.

The Graal SDK API is generic API shared among all Truffle languages and not specific to R, so there are certain limitations. I must admit that I do not know much about Closure, but could you provide some proxy objects that would wrap R object (the Value instance) and provide "Closure" friendly interface? I think that is the best approach for providing truly "native" experience for given language. Like if you wrap some Java API for better usage in Scala, for example. Rodrigo is doing this for Ruby: https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021.

Going back to the approach you suggest: that could be feasible too. At this point the problem is that you cannot "read" all the information about given R object via the Graal SDK API. We do not expose the attributes ("class" and "dims" is one of those). Right now you can use the R functions for reading them when creating the Closure mirror of the R object and writing them when converting the Closure mirror back to R. Example:

Value get_attrs = context.eval("R", "attributes");
Value robj = context.eval("R", "lm(Petal.Width ~ Sepal.Length + Species, data=iris)");
Value robj_attrs = get_attrs.execute(robj);
// ...
Value set_attrs = context.eval("R", "`attributes<-`");
with_attrs = set_attrs.execute(another_object, robj_attrs);
// alternatively
context.eval("R", "function(x, attrs) { attributes(x) <- attrs; now x is ready to be further used }").execute(another_object2, robj_attrs);

We will consider adding the attributes as "internal" slots to make them accessible via the Graal SDK.

I might also note that usually R object are meant to be read-only, so keeping the R object as a Polyglot object is a reasonable solution for most use-cases, but what about the objects that are meant to be modified?

They are also read-only in R itself, it just has syntax sugar for it: x[3] <- 42 is really a call to function called [<-, which returns the updated vector (whole expression is x <- `[<-`(x,3,42)). The original object should stay the same (the runtime is playing tricks to avoid copying if possible). So with the Context API you can do something like:

Value vector = context.eval("R", "c(1,2,3)");
Value subset_assign = context.eval("R", "`[<-`");
vector = subset_assign.execute(vector, 2, 42);