r-lib / svglite

A lightweight svg graphics device for R
https://svglite.r-lib.org
180 stars 39 forks source link

performance hit with svgstring() #58

Closed timelyportfolio closed 8 years ago

timelyportfolio commented 8 years ago

I decide to test for speed differences between svglite and svgstring. I was surprised that svglite actually is significantly quicker on my machine.

> system.time({
+   fl <- tempfile(fileext=".svg")
+   svglite( file = fl )
+   plot(runif(10000),1:10000)
+   dev.off()
+ })
   user  system elapsed 
   0.13    0.06    0.20 
> 
> unlink(fl)
> 
> system.time({
+   svgstring()
+   plot(runif(10000),1:10000)
+   dev.off()
+ })
   user  system elapsed 
  75.99    2.25   79.60 

I'm wondering now if we should revert inlineSVG back to svglite.

Changing to runif(1000) and using microbenchmark, I get the following.

image

library(microbenchmark)
library(svglite)

svg_lite_fun <- function(){
  fl <- tempfile(fileext=".svg")
  svglite( file = fl )
  plot(runif(1000),1:1000)
  dev.off()
  unlink(fl)
}

svg_string_fun <- function(){
  svgstring()
  plot(runif(1000),1:1000)
  dev.off()
}

mb <- microbenchmark( svg_lite_fun(), svg_string_fun(), times = 10)
boxplot(mb)
hadley commented 8 years ago

I'm really not surprised that the performance isn't as good.

timelyportfolio commented 8 years ago

Is it bad enough to revert back to svglite in inlineSVG? It is far worse than I expected, but it is always nice not to create a file.

yixuan commented 8 years ago

Probably it's because insertion in string stream requires constructing new strings repeatedly?

hadley commented 8 years ago

@yixuan it seems to be copying the string from C++ to R that's slow. If I rewrite flush() to be:

  void flush() {
    stream_.flush();
    std::string x = stream_.str();
    # env_["svg_string"] = x;
  }

I get

Unit: milliseconds
             expr  min   lq  mean median    uq  max neval cld
   svg_lite_fun() 9.52 9.88 11.07  10.14 13.19 14.0    10   b
 svg_string_fun() 6.71 6.80  8.33   8.08  9.86 10.3    10  a 

If I uncomment the last line I get:

Unit: milliseconds
             expr    min    lq  mean median    uq max neval cld
   svg_lite_fun()   9.63  11.2  12.4   11.5  11.8  22    10  a 
 svg_string_fun() 330.97 336.7 371.3  349.1 431.0 453    10   b
hadley commented 8 years ago

If I construct the STRSXP directly by hand with Rf_mkCharLenCE(&x[0], x.length(), CE_UTF8), I get the same performance. It's probably related to R's global string pool - hashing that giant string takes some time.

yixuan commented 8 years ago

This a good catch. Thanks Hadley.

One possible way to solve this is to cache the SVG string in C++ rather than in R, and we only do the copy when the string is requested in R. Some pseudo code may look like

svgstring <- function(width = 10, height = 8, bg = "white",
                      pointsize = 12, standalone = TRUE) {

  env <- new.env(parent = emptyenv())
  svgstring_(env, width = width, height = height, bg = bg,
    pointsize = pointsize, standalone = standalone)

  function() {
    .Call("get_string_from_cpp", env$svg_string_ptr)
  }
}

where env$svg_string_ptr saves the address of the cached C++ string. And in C++

class SvgStreamString : public SvgStream {
  std::stringstream stream_;
  Rcpp::Environment env_;
  std::string cached_string;

public:
  SvgStreamString(Rcpp::Environment env): env_(env) {
    stream_ << std::fixed << std::setprecision(2);
    env_["svg_string_ptr"] = somehow_return_the_pointer(&cached_string);
  }
  void flush() {
    stream_.flush();
    cached_string = stream_.str() + "</svg>";
  }
};

This way requires to take care of some other stuffs of course, for example returning proper string when device is closed.

hadley commented 8 years ago

I agree that that's a better interface - although it would be even better to return an Xptr from svgstring_, and then wrap with an accessor function. Then flush() wouldn't have to do anything - you'd call a special method from the accessor.

I didn't think it would solve the performance issue, but I guess flush() gets called multiple times so it might be a lot better.

yixuan commented 8 years ago

Oh yeah, you are right. flush() can be empty and we direct return string from stream_.

hadley commented 8 years ago

Do you want to have a go at a PR? Do you get what I mean about the external pointer?

yixuan commented 8 years ago

Yeah I think so. I can have a try maybe later this week.