sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.38k stars 469 forks source link

Wrapper for R graphics commands. #11266

Open d6c26cf6-9aec-430c-9f63-586ad566bb8c opened 13 years ago

d6c26cf6-9aec-430c-9f63-586ad566bb8c commented 13 years ago

Currently, using R graphics in sage is ugly. A typical session looks like ...

r.png()
r.boxplot(.......)
r.dev_off()

... for EVERY graphic you want to create. This is a shame as R has some very advanced graphics functions.

I would like sage to have a wrapper for R graphics which would:

A typical session would look like:

graph = Rgraphic(R.Cairo arguments)
graph.boxplot(arguments)
graph.histogram(arguments)
graph.etc......
show(graph)

This would help with #8868, as it would allow us to hide the details of the graphics backend. (Cairo in Linux(And possibly Windows), and Quartz/Aqua in MacOS.)

A possible implementation might sound like:

The __init__ method accepts arguments eventually destined for R.Cairo, but are stored in a variable.

Then, all method calls that aren't defined in the Rgraphic class could be stored in a list inside the instance to be called later.

Finally, the .__show__() method initialises the R.Cairo object with the stored __init__ arguments, runs all the non-class calls from the list in R, then closes the R.Cairo object, with the possibility of storing the returned image inside the object.

Some extra points to consider are covered in the following email snippet.


"As for trellis/lattice packages, this highlights what could be a big problem with R, as there are literally thousands of different packages, a good few that create plots. If we create a list of pre-approved methods, this risks leaving a lot out. (There's some cool kriging stuff I'd like this wrapper to support.)

I was considering (for my first code-bash this weekend) ignoring loading, and storing any non-defined method call to this object and attempting to run it in R. It's not secure, but would have the benefit of ensuring that any loaded graphic function would run successfully as a method. If it's necessary that we have a list of well-defined methods to the Rgraphic object, would we be able to escape it using an extra check=False argument to the __init__ method.

So a typical session with my object would look like:

R.library("a graphics library")
R.library("another graphics library")

graph1 = Rgraphic(....................., check=False)
graph1.boxplot(...............)
graph1.someotherRfunction(..................)
graph1.somethingelse(..............)
show(graph1)

Still have to sort out how to store the image though. :("


See also #8868 and #11249.

CC: @kcrisman @jasongrout

Component: interfaces

Keywords: R, cairo, wrapper, r-project

Issue created by migration from https://trac.sagemath.org/ticket/11266

kcrisman commented 13 years ago

Description changed:

--- 
+++ 
@@ -1,5 +1,3 @@
-This ticket links to [https://github.com/sagemath/sage/issues/8868](https://github.com/sagemath/sage/issues/8868) and [https://github.com/sagemath/sage/issues/11249](https://github.com/sagemath/sage/issues/11249)
-
 Currently, using R graphics in sage is ugly. A typical session looks like ...

 ```python
@@ -29,7 +27,7 @@
 show(graph)

-This would help with https://github.com/sagemath/sage/issues/8868, as it would allow us to hide the details of the graphics backend. (Cairo in Linux(And possibly Windows), and Quartz/Aqua in MacOS.) +This would help with #8868, as it would allow us to hide the details of the graphics backend. (Cairo in Linux(And possibly Windows), and Quartz/Aqua in MacOS.)

A possible implementation might sound like:

@@ -37,7 +35,7 @@

Then, all method calls that aren't defined in the Rgraphic class could be stored in a list inside the instance to be called later.

-Finally, the .show() method initialises the R.Cairo object with the stored __init__ arguments, runs all the non-class calls from the list in R, then closes the R.Cairo object, with the possibility of storing the returned image inside the object. +Finally, the .__show__() method initialises the R.Cairo object with the stored __init__ arguments, runs all the non-class calls from the list in R, then closes the R.Cairo object, with the possibility of storing the returned image inside the object.

Some extra points to consider are covered in the following email snippet.

@@ -65,4 +63,6 @@


+See also #8868 and #11249.

+

kcrisman commented 13 years ago
comment:3

From a related thread:

This seems to happen in sagenb/notebook/cell.py in files_html().  The 
png is created by R, lives in the directory via r.py and the 
evaluation code in server/support.py (I think), and then is added to 
the output of the cell in files_html(). 
jasongrout commented 13 years ago
comment:4

As a possible interim step, you could just make a context handler that does the .png() and .dev_off() commands, so something like this would work:

from contextlib import contextmanager

@contextmanager
def r_graphics(r):
    r.png()
    yield
    r.dev_off()

with r_graphics(r):
    r.boxplot()
    r.some_other_plot()
d6c26cf6-9aec-430c-9f63-586ad566bb8c commented 13 years ago
comment:5

Man, I haven't done ANY serious programming in python for ages. Things I learnt: with keyword, apply depreciated, and how to use getattr.

I REALLY like jason's context handler. Simple and elegant. It shouldn't be too difficult to extend it to do some checking of R capabilities and perhaps capturing of the png.

Maybe what we really need is an extra page in the documentation that shows all the R tricks people have found. I'm looking for a tutorial I saw some time back which shows how to keep sage and R variables syncronised. Will post it once I find it.

Anycase, the following achieves the same stuff as Jason's suggestion, but as a class. It's missing a __repr__ method though. :(

class Rgraph:
    def __init__(self,*args,**kwargs):
        self.graph_args = args
        self.graph_kwargs = kwargs
        self.store = []
    def savefunction(self,*args,**kwargs):
        self.store.append((self.lastcall,args,kwargs))
    def __getattr__(self,function):
        self.lastcall = function
        return self.savefunction
    def show(self):
        r.png(*self.graph_args,**self.graph_kwargs)
        for function in self.store:
            getattr(r,function[0])(*function[1],**function[2])
        r.dev_off()

When called, it acts as follows:

gr.Rgraph()
gr.boxplot(someRdataIhadlyingaround)
show(gr)

It does have the advantage that the class can be called and "filled" in one cell, and then displayed using show(gr). Then it can be "topped up" with further r calls and the appended graph displayed using show(gr) again.

Interestingly, this seems to swallow the extra info (PNG 2) that R throws up for boxplot.

jasongrout commented 13 years ago
comment:6

I like your class!

jasongrout commented 13 years ago
comment:7

Some comments:

  1. A possible extension is to do tab completion of R graphics commands. I think you just need a function or attribute that gives the tab completions. Can't remember what it is off the top of my head, though.

  2. Why don't you put the single line of savefunction inside __getattr__? I'm sure you must have a good reason; I just can't figure it out.

d6c26cf6-9aec-430c-9f63-586ad566bb8c commented 13 years ago
comment:8

It's been a while since I programmed python, and I'm still learning some of the new features.

I was under the assumption that to use __getattr__ for dynamic methods, a function had to be returned, which was then called with arguments? This is why I have the __getattr__ store the function name in self.lastcall, so it could be passed to savefunction.

I'm well aware that my code will need a lot of work to streamline it, but I thought getting a basic structure up might help the more advanced coders.

I'll have a look to see what needs to be done to add tab completions.

What I'm really focusing on now is image caching. From what I understand R graphics get output via the following process:

Sage notebook scans through the directory created by the cell, looking for any images that have been created. It then drops a html link referring to the image into the notebook output cell.

This suggests a method of caching image results as an optimisation to speed up multiple calls.

  1. The show method checks to see if a self.cache variable is defined.

2a. If it isn't, the show method runs all the stored r graphics calls, and stores the name and cell directory location of the png file in the self.cache variable.

2b. If the self.cache variable IS defined, show( ) COPIES the png file from the old cell directory into the new cell directory, and relies on sage notebook to take care of the loading.

  1. __getattr__ has an extra line added that deletes self.cache every time a new r graphic method is called.

This SHOULD result in the Rgraphic object copying the old png into the new cell, as long as no new method calls have been added to the Rgraphic.

After that, if we go for my class method over your neat "with" method, we just need to come up with a nice way to control the list of r graphic calls - i.e. some append/delete methods.

Is there a way to add your "with" algorithm so that it's automatically and invisibly called from the sage notebook on any r function that requires graphics (maybe a decorator or something applied to the R. class)?

Because then we could strip out all the graphical stuff from my class, rename it Rcommandlist or something, and just have a session like this:

all_the_with_stuff_done_invisibly_by_sage(including, checking, image, capabilities)

class Rcommandlist:
    def __init__(self):
        self.store = []
    def savefunction(self,*args,**kwargs):
        self.store.append((self.lastcall,args,kwargs))
    def __getattr__(self,function):
        self.lastcall = function
        return self.savefunction
    def show(self):
        for function in self.store:
            getattr(r,function[0])(*function[1],**function[2])

gr = Rcommandlist()
gr.boxplot(arguments)
gr.lowlevelRgraphicfunctions()
show(gr)

Because I look at my Rgraphic class method, and based on lines saved, the only advantage it has over traditional r invocation, is that it allows the r commands to be stored in an object. I'd much prefer invisible graphic calls, even if this loses the possibility of image caching (Because frankly, how often does somebody create an identical graph __twice__ in the same spreadsheet?).

jasongrout commented 13 years ago
comment:9

We use rpy(2?) (http://rpy.sourceforge.net/rpy2.html) in order to interact with R. I wonder if there is an easy to modify it to do what you suggest with saving graphics.

I agree with your last point; I wonder if the effort to implement the caching (plus its reliance on specific notebook behavior) is worth the benefits it provides. Of course, I'm not a heavy R user, but even as far as Sage graphics go, we don't do that sort of copying between cells---I think it would be practically impossible for us to tell if a graphic in one cell should be exactly like the graphic in another cell without pretty much generating the graphic anyway.

Your Rcommandlist class is turning into what looks like just a function. How is better than something like defining a function, which also is a way of storing a sequence of commands:

def myplot(argments):
    r.boxplot(arguments)
    r.lowlevelfunction()
    r.dev_off()

myplot(arguments)
jasongrout commented 13 years ago
comment:10

(I mention the above points to carry on design discussion, not to disparage the ideas. I really am curious how the class is better than just defining a new custom function, and if the caching effort is worth it.)

jasongrout commented 13 years ago
comment:11

Using the Google summer of code project, it may be very easy for us to have a Sage Graphics object that does R stuff. For example, see http://rpy2-gsoc.blogspot.com/2010/08/all-good-things.html, where he talks about having R draw onto a matplotlib canvas in a not-yet-released rpy2 version.

d6c26cf6-9aec-430c-9f63-586ad566bb8c commented 13 years ago
comment:12

I think you're right about using a function call rather than a class.

So is the final conclusion:

  1. Wait for rpy2

  2. Put the contextmanager solution into sage documents

  3. Possibly put up some guides on how to do things in sage/R?

Should we change the ticket to a documentation ticket?

d6c26cf6-9aec-430c-9f63-586ad566bb8c commented 13 years ago
comment:13

Hah. Finally found the tutorial I was looking for.

Any chance this can be added to the documentation for R?

http://www.sagenb.org/home/pub/2232/

kcrisman commented 13 years ago
comment:14

Jason, are you sure we use rpy2 to communicate with R?

EXAMPLES:
            sage: r.eval('1+1')
            '[1] 2'
        """
        # TODO split code at ";" outside of quotes and send them as individual
        #      lines without ";".
        return Expect.eval(self, code, synchronize=synchronize, *args, **kwds)

and the R interface init method seems to agree that we are calling R directly. In fact,


sage: search_src('rpy')

only returns things that seem to have to do with trying to convert Sage numbers into rpy numbers, but nothing to do with the R interface.

jasongrout commented 13 years ago
comment:15

I'm not sure if we rpy or rpy2. That's why I originally said "rpy(2?)". At one time, I looked at upgrading to rpy2, but I'm not sure if the work was ever finished.

kcrisman commented 13 years ago
comment:16

My point is that I don't think we use rpy OR rpy2 directly for r.eval or other things. It is an option, but I am pretty sure we don't actually use it except in some documentation where it shows how to use it. We discussed trying to switch once, but this seemed better (and I still think it's better to interact directly, as rpy2.classic or whatever was a pain to figure out).

jasongrout commented 13 years ago
comment:17

We don't use rpy? That's news to me. I was pretty sure we used rpy, but you're the expert here.

kcrisman commented 12 years ago

Changed keywords from R, cairo, wrapper to R, cairo, wrapper, r-project

kcrisman commented 9 years ago
comment:23

I believe William has this working without such things in SMC.

williamstein commented 9 years ago
comment:24

I'm happy to share my code for any use. This is the code I currently use in SMC for this purpose. The line "salvus.stdout('\n'); salvus.file(tmp, show=True); salvus.stdout('\n')" would have to change...

# Monkey patch the R interpreter interface to support graphics, when
# used as a decorator.

import sage.interfaces.r
def r_eval0(*args, **kwds):
    return sage.interfaces.r.R.eval(sage.interfaces.r.r, *args, **kwds).strip('\n')

r_dev_on = False
def r_eval(code, *args, **kwds):
    """
    Run a block of R code.

    EXAMPLES::

         sage: print r.eval("summary(c(1,2,3,111,2,3,2,3,2,5,4))")   # outputs a string
         Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
         1.00    2.00    3.00   12.55    3.50  111.00

    In the notebook, you can put %r at the top of a cell, or type "%default_mode r" into
    a cell to set the whole worksheet to r mode.

    NOTE: Any plots drawn using the plot command should "just work", without having
    to mess with special devices, etc.
    """
    # Only use special graphics support when using r as a cell decorator, since it has
    # a 10ms penalty (factor of 10 slowdown) -- which doesn't matter for interactive work, but matters
    # a lot if one had a loop with r.eval in it.
    if sage.interfaces.r.r not in salvus.code_decorators:
        return r_eval0(code, *args, **kwds)

    global r_dev_on
    if r_dev_on:
        return r_eval0(code, *args, **kwds)
    try:
        r_dev_on = True
        tmp = '/tmp/' + uuid() + '.svg'
        r_eval0("svg(filename='%s')"%tmp)
        s = r_eval0(code, *args, **kwds)
        r_eval0('dev.off()')
        return s
    finally:
        r_dev_on = False
        if os.path.exists(tmp):
            salvus.stdout('\n'); salvus.file(tmp, show=True); salvus.stdout('\n')
            os.unlink(tmp)

sage.interfaces.r.r.eval = r_eval