Closed DavisVaughan closed 5 years ago
Is there a way to check R's garbage collector? We might not free memory correctly, or perform some other weird side effect (that's the first thing that comes to my head). Since this behavior is different between tensor & array, it might be related to wrapping of the shape / stride storage in xtensor-R.
I think you might be right on the freeing memory idea.
Notice that here I am just creating that rarray<double>
, but returning 1.
// [[Rcpp::depends(xtensor)]]
// [[Rcpp::plugins(cpp14)]]
#include <xtensor-r/rarray.hpp>
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::IntegerVector add_cpp_noreturn(const xt::rarray<double>& x, const xt::rarray<double>& y) {
xt::rarray<double> res = x + y;
res = res + 1;
Rcpp::IntegerVector z = Rcpp::IntegerVector::create(1);
return z;
}
Here are some interesting memory results. Look especially at what happens if you just pass in vectors that aren't variables.
Rcpp::sourceCpp("~/Desktop/test.cpp")
library(tictoc)
library(pryr)
x <- as.double(1:1000)
y <- as.double(1:1000)
mem_used()
#> 33 MB
for(i in 1:2000) add_cpp_noreturn(x, y)
mem_used()
#> 37.8 MB
for(i in 1:2000) add_cpp_noreturn(x, y)
mem_used()
#> 38 MB
for(i in 1:2000) add_cpp_noreturn(x, y)
mem_used()
#> 38.2 MB
for(i in 1:2000) add_cpp_noreturn(x, y)
mem_used()
#> 38.4 MB
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
mem_used()
#> 72 MB
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
mem_used()
#> 105 MB
# R of course doesn't have this issue
for(i in 1:2000) x + y
mem_used()
#> 105 MB
# call an explicit gc. but the memory is still
# being held onto!
gc()
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 586906 31.4 1003967 53.7 NA 637081 34.1
#> Vcells 9077036 69.3 15662778 119.5 32768 11102034 84.8
mem_used()
#> 105 MB
With or without using variables, it seems to always slow down. I would guess they are still related issues though.
Rcpp::sourceCpp("~/Desktop/test.cpp")
library(tictoc)
library(pryr)
x <- as.double(1:1000)
y <- as.double(1:1000)
tic()
for(i in 1:2000) add_cpp_noreturn(x, y)
toc()
#> 0.181 sec elapsed
tic()
for(i in 1:2000) add_cpp_noreturn(x, y)
toc()
#> 0.362 sec elapsed
tic()
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
toc()
#> 0.545 sec elapsed
tic()
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
toc()
#> 0.78 sec elapsed
Thanks for diving deeper into this. This means we need to decrement the reference count (or equivalent) somewhere. I think it might be the integer vector from the shape...
Is this wrong? https://github.com/QuantStack/xtensor-r/blob/master/include/xtensor-r/rcontainer.hpp#L190
I removed the if(m_owned)
part and reran the above example and get:
Rcpp::sourceCpp("~/Desktop/test.cpp")
library(tictoc)
library(pryr)
x <- as.double(1:1000)
y <- as.double(1:1000)
mem_used()
#> 33 MB
for(i in 1:2000) add_cpp_noreturn(x, y)
mem_used()
#> 37.5 MB
for(i in 1:2000) add_cpp_noreturn(x, y)
mem_used()
#> 37.5 MB
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
mem_used()
#> 37.6 MB
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
mem_used()
#> 37.6 MB
and
Rcpp::sourceCpp("~/Desktop/test.cpp")
library(tictoc)
library(pryr)
x <- as.double(1:1000)
y <- as.double(1:1000)
tic()
for(i in 1:2000) add_cpp_noreturn(x, y)
toc()
#> 0.048 sec elapsed
tic()
for(i in 1:2000) add_cpp_noreturn(x, y)
toc()
#> 0.056 sec elapsed
tic()
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
toc()
#> 0.074 sec elapsed
tic()
for(i in 1:2000) add_cpp_noreturn(as.double(1:1000), as.double(1:1000))
toc()
#> 0.031 sec elapsed
This let's me show off why I like this library. Broadcast sum of 10000x1 and 1x10000. Competitive times to what you'd probably do in base R, and look at that memory usage
Regarding m_owned: Not sure. There are some issues around move constructors and garbage collection which I think are taken care of by m_owned. Need to investigate that deeper.
Timings looks extremely promising! Ideally it should be faster ;) I think this special case of outer-sum broadcasting is something we haven't optimized to the max (yet).
Btw. would you be interested in having a chat about rray via video at some point? There are some other interesting ideas that lately came up regarding JIT compilation that you / the R community might be interested in.
Re m_owned) Okay, I'll happily hand the investigation off to you if that's alright, I'm shooting in the dark at this point to be honest. Happy to test anything though.
I've got a branch of xtensorrr
that has xsimd
in it, but haven't really seen the speed increase. Maybe it's just me. More investigation required. I'm using it like:
// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]
#define XTENSOR_USE_XSIMD
#include <xtensor-r/rarray.hpp>
#include <Rcpp.h>
// [[Rcpp::export]]
SEXP rray_add_cpp_simd(xt::rarray<double> x, xt::rarray<double> y) {
xt::rarray<double> res = x + y;
return res;
}
And sure! My email is on my profile page here if you want to reach out and set something up https://github.com/DavisVaughan
yep, there is definitely a problem with the owned variable. It needs to be true in many more cases. I'll have a fix by tomorrow.
Awesome thanks a lot!
In my tests this is resolved with current master -- the issue was creation of new objects in R on call as we've debugged together. Thanks again!
Consider the following cpp file:
I was doing some benchmarks and was surprised to see that repeated calls back to this function slow it down.
Eventually this get's out of control enough that it crashes. As you can see it also spreads to other functions, like the subtraction one. It happens when using
sourceCpp()
as I did here and inside a built package. Any idea what is happening? It doesn't seem to happen withrtensor
s interestingly enough.