xtensor-stack / xtensor-r

R bindings for xtensor
BSD 3-Clause "New" or "Revised" License
87 stars 15 forks source link

list of xexpression? #60

Closed DavisVaughan closed 5 years ago

DavisVaughan commented 5 years ago

This might be a simplistic/dumb question, but is it possible to store xexpressions in some container? For example, if x and z are rarray<double> objects:

y = x + z;
yy = x - z;

y and yy are xfunction types from what I can tell. But their types are different as one is from + and one is from -. Is there some container I can create to store them in together? Something like:

std::list<some_type> y_list;
y_list.push_back(y);
y_list.push_back(yy);

Thanks, appreciate the help in advance.

wolfv commented 5 years ago

This is not at all a dumb question. It's rather a pretty tough one.

There are a couple of approaches:

for the last point, we should probably figure out how to have the holder perform more interesting tasks. @martinRenou might have some ideas. E.g. I think it would be interesting to be able to get a xcontainer out of the saved xexpression.

For the two other bits, we have implementations for std::variant and std::any in the xtl if you're not using C++17 yet.

Also, if you want to store xfunctions, note that the arguments to the xfunction are taken by reference! If the reference goes out of scope, evaluating the xfunction will segfault. One way around this is to use std::move on the function arguments, but you can only do this once ... or the shared_xexpression, which keeps track of how often the argument is used and then cleans up after.

DavisVaughan commented 5 years ago

Thanks for the help!

I really didn't want to evaluate them yet as I might want to perform additional operations on them and I wanted to delay the evaluation as much as possible so it just happened once at the end when I convert back to a rarray<double> (and through that, an R object). I figured that would result in the most performant code, as opposed to assigning to rarray<double> containers with every operation.

I thought about doing something like boost::any / boost::variant (I didn't know it was in cpp17!) but I don't know how to cast them back appropriately afterwards, and for variant I didn't know how to even specify the types that were allowed. Say you have the example above:

auto y = x + z
auto yy = x - z;
// shoving things in the list
using xexpression_types = std::variant<what here?>;
std::list<xexpression_types> xexpression_list;
xexpression_list.push_back(y);
xexpression_list.push_back(yy);

// do some stuff

// getting them back out later
what_here y_again = std::get<what here?>(xexpression_list.front());

Maybe I'm just trying things I shouldn't be ¯\(ツ)

Update) ok, the std::visit idea is starting to make sense with this article, but I still don't know how to construct this: std::variant<what here?> https://arne-mertz.de/2018/05/modern-c-features-stdvariant-and-stdvisit/

JohanMabille commented 5 years ago

From your example you could write something like:

auto y = x + z;
auto yy = x - z;
using y_type = decltype(y);
using yy_type = decltype(yy);
using expression_types = xtl::variant<y_type, yy_type>;

std::list<xexpression_types> xexpression_list;
xexpression_list.push_back(y);
xexpression_list.push_back(yy);

// do some stuff

// getting them back out later
auto y_again = std::get<y_type>(xexpression_list.front());

The problem with this approach is you need to know the type of all your expressions before creating the variant. Another possibility is to make xexpression_holder provide the same API as regular xtensor expressions, except that the method will be virtual. This is not straightforward to do because of the template methods, however I'm experimenting some solutions in the context of xframe that I will probably backport to xtensor once I'm happy with them.

DavisVaughan commented 5 years ago

using y_type = decltype(y);

this is 👍 I had no idea I could do that. super valuable.

Still doesn't quite solve my problem though, bc as you said I'd have to know all of the expression types ahead of time. What I really would like is a way to store types in a list that I could build as i iterate through operations, but I am fairly sure thats not a thing.

For my actual problem, what I'm trying to build is a way to expose the laziness of xtensor to R. So you could write (in R)

y <- as_rarray(matrix(c(1,2,3,4,5))
m <- matrix(c(2,3,4,5,6))

# x would not hold the actual value, but would "know" that a
# + operation between y and m was used to create it. 
# It could also infer the shape so it 
# could print something useful.
x <- y + m

# again, z is not known, but this knows about x and 2 and -, 
# and x knows how it should be created...and so on
z <- x - 2

# compute() would actually calculate z, performing the 
# + and - operations in one pass at the C++ level with xtensor for 
# optimal performance
compute(z)

Now this is all great, and at the cpp level I'm trying to do something like:

op_list = {"minus", "plus"}
// these are really all rarray<double>'s by now
rarray_list = {2, y, m}

and then it would iterate in reverse order, taking "plus" and y and m and doing that operation, getting the xexpression back, and putting that on the front of the rarray_list list (this doesn't work). then doing "minus" with 2 and the result of the "plus" operation to finally get z. Again this is an xexpression.

Then at the end it just converts to an rarray<double> once (or it could slice out the first few rows if the user just wanted to view a little of the result, from what I understand, this can be done without computing everything).

The biggest issue is that I can't push those intermediate xexpression results anywhere because they are all different types because they come from different operations.

DavisVaughan commented 5 years ago

I'm happy to have this closed as you've essentially given a great answer at the beginning.

I haven't completely solved my own issue yet, and if you had time to give thoughts on the above comment that'd be helpful, but no rush and otherwise good to close.

DavisVaughan commented 5 years ago

I don't think there is much else to discuss here, so I'll close.