oracle / fastr

A high-performance implementation of the R programming language, built on GraalVM.
Other
623 stars 64 forks source link

Performance of as.raster and slam::as.simple_triplet_matrix in fastr #72

Open hsselman opened 5 years ago

hsselman commented 5 years ago

Hi,

I am using graalvm and fastr rc-15. I was converting pdf objects using base functions and slam package in fastr. I got no errors using the functions I want to use. But I did found it extremely slow using as.raster and slam::as.simple_triplet_matrix.

For example, I made this file (let's call it converting_pdf.R):

library(pdftools)
library(magick)
library(slam)
library(tictoc)

# A lot of code to get an object to test performance on
tic("Total time")
tic("Read a pdf page (using pdftools::pfd_render_page)")
img <- pdf_render_page(pdf = "https://cran.r-project.org/web/packages/purrr/purrr.pdf", page = 1)
toc()
tic("Convert a pdf page to ImageMagick object (using magick::image_read)")
img_magick <- image_read(img)
toc()
tic("Convert ImageMagick object to raster (using as.raster)")
img_raster <- as.raster(img_magick)
toc()
tic("Convert raster to simplet_triplet_matrix (using slam::as.simple_triplet_matrix)")
sparse <- as.simple_triplet_matrix(img_raster)
toc()
toc()

Sourcing the files in R and fastR (source("converting_pdf.R")) gives me:

# R
Read a pdf page (using pdftools::pfd_render_page): 0.416 sec elapsed
Convert a pdf page to ImageMagick object (using magick::image_read): 0.004 sec elapsed
Convert ImageMagick object to raster (using as.raster): 0.029 sec elapsed
Convert raster to simplet_triplet_matrix (using slam::as.simple_triplet_matrix): 0.075 sec elapsed
Total time: 0.524 sec elapsed

# fastR
Read a pdf page (using pdftools::pfd_render_page): 0.234 sec elapsed
Convert a pdf page to ImageMagick object (using magick::image_read): 0.014 sec elapsed
Convert ImageMagick object to raster (using as.raster): 2.455 sec elapsed
Convert raster to simplet_triplet_matrix (using slam::as.simple_triplet_matrix): 0.904 sec elapsed
Total time: 3.611 sec elapsed

Do you know what is happening? Thanks in advance and for developing fastR!

PS: Good news! Function pdf_render_page is faster in fastr!

steve-s commented 5 years ago

Hello hsselman,

thank you for all the three reports and the feedback! We will take a closer look at them and get back when we have more details.

In general, FastR usually needs some time to warm-up. You can try running the benchmark in a loop for few times and see if the performance is improving. If you would like to dig deeper, then you can run with --vm.Dgraal.TraceTruffleCompilation=true which will print some basic log of background compilation. The benchmark will reach peak performance when there is no compiler activity. Moreover, you can also use the built-in R level profiler (--cpusampler) to see if there is anything that stands out.

Another thing to note is that at this point FastR performance is not very good with code that does a lot of transitions between the native code of packages and the R runtime and unfortunately that is typical of some of the tidyverse packages (especially dplyr). In other words: we are much better at running R code than C code that transitions back and forth between R and C. Recently we've invested most of our effort into the compatibility with tidyverse and other important packages that use C/C++ extensively and now we're planning to work on their performance.

hsselman commented 5 years ago

Hi steve-s,

Thanks for taking a look and your answer! The provided profilers can hopefully help us troubleshoot future performance issues.

We'll eagerly await further performance improvements on FastR running tidyverse packages, as they are quite central to Datascience work. Until then, we'll try to create workarounds for bottlenecks.

Keep up the good work!