xtensor-stack / xtensor-r

R bindings for xtensor
BSD 3-Clause "New" or "Revised" License
86 stars 15 forks source link

Support R logical vectors #62

Closed wolfv closed 5 years ago

wolfv commented 5 years ago

R stores booleans as int32. This means that we would need to add a proxy type to cast to a storage that is an int32 but behaves otherwise like a bool.

wolfv commented 5 years ago

@DavisVaughan I am going to write down my thoughts on how to support R's Logical with xtensor-r here.

I think the easiest way will be to have a "fake" type called rlogical. We will need to document this quite clearly, but this type will only be used to select the appropriate value_type for the xarray/xtensor.

We would define this type like so:

struct rlogical {};

And then define a metafunction that will select a value_type based on another type:

template <class T>
struct get_value_type
{
    using type = T;
};

template <>
struct get_value_type<rlogical>
{
    using type = int;
};

This type would then be used inside rarray to select the correct T for the storage_type from the supplied type. So instead of using T we'd use the obtained value type is in this part of xcontainer_inner_types part of rarray.

https://github.com/QuantStack/xtensor-r/blob/aec86e650270ef2e5adbf8d544e3dc174c14fd37/include/xtensor-r/rarray.hpp#L31-L35

To select the R type we currently use the following bit of code in rarray (and rtensor). This would need to select the result for <bool> here -- it's not going to work with our new fake type. So a workaround (e.g. the meta function from above) needs to be implemented.

constexpr static int SXP = Rcpp::traits::r_sexptype_traits<T>::rtype;

to instead return for this special case

constexpr static int SXP = Rcpp::traits::r_sexptype_traits<bool>::rtype;

One could use a ternary here with SXP = std::is_same<T, rlogical>::value ? sexptype<bool> : sexptype<T>;

We could do the same trick by just using bool as the template type for the rarray but in my opinion it could be confusing for users to see that bool is the type they select, but actually the value_type is int32. Therefore I think it's better to be explicit that this is a special behavior from R and we can document it clearly and, for example, give the advice in the docs to use xt::cast<bool>(rarray<rlogical>(...)) if the user wants to obtain C++ bool values.

Let me know if you want to give it a shot! Probably I am overlooking a bunch of things here. And maybe @JohanMabille or @SylvainCorlay have some comments, too.

Also we all tend to be on gitter a lot: https://gitter.im/QuantStack/Lobby

JohanMabille commented 5 years ago

Another solution (that requires more code) is to use a "strong typedef": rlogical type can be converted to and instantiated from bool:

class rlogical
{
public:

    rlogical(bool b) : m_value(static_cast<int>(b) {}
    operator bool() const noexcept { return static_cast<bool>(m_value); }

private:

    int m_value;
};

It could even be convertible to int (so it's guarantee to be a noop), then when instantiating an rarray / rtensor, a reinterpret_cast on the buffer is required so it stores a xbuffer_adaptor<rlogical*> instead of xbuffer_adaptor<int>. Since the memory layout of these types is the same, everything is fine, and this ensures deltype(rarray<rlogical> != decltype(rarray<int>).

The only problem here is we can have compilation errors if we use boolean operations on rlogical types only, I'm not sure the compiler will convert all the operands before evaluating the operator. In that case, we would need to add all the logical operators for rlogical.

DavisVaughan commented 5 years ago

Great, I'll take a look this week! Thanks for the head start.

DavisVaughan commented 5 years ago

Hi @wolfv and @JohanMabille, I've got some good news. I've got a working implementation over in my xtensorrr library (I just keep a copy there with all of the headers for ease of install when using rstudio).

Here is a summary of the changes so far: https://github.com/DavisVaughan/xtensorrr/compare/support-logical#diff-10c95f056a2334fe2130a5709046caceR27

A few things:

For that last point, I got the idea from a past post I had seen where dirk extends the wrap() and as() templated functions to work with boost ublas. So I think this is okay? http://gallery.rcpp.org/articles/custom-templated-wrap-and-as-for-seamingless-interfaces/

When I do the full PR, feel free to critique anything so it fits your code style. For now, I'm looking for more of a "am I on the right track?"

Here are a few examples that show it working with rray.

library(rray)

ex <- rray(c(TRUE, FALSE))
ex
#> <vctrs_rray<logical>[,1][2]>
#>      [,1] 
#> [1,]  TRUE
#> [2,] FALSE

# internally converting to rarray<rlogical> and back
rray_broadcast(ex, c(2, 3, 2))
#> <vctrs_rray<logical>[,3,2][12]>
#> , , 1
#> 
#>      [,1]  [,2]  [,3] 
#> [1,]  TRUE  TRUE  TRUE
#> [2,] FALSE FALSE FALSE
#> 
#> , , 2
#> 
#>      [,1]  [,2]  [,3] 
#> [1,]  TRUE  TRUE  TRUE
#> [2,] FALSE FALSE FALSE

as.logical(ex)
#>       [,1]
#> [1,]  TRUE
#> [2,] FALSE

as.integer(ex)
#>      [,1]
#> [1,]    1
#> [2,]    0

# using your elementwise equality
# the return value of equal() gets stored directly
# into rarray<rlogical>
rray_equal(ex, FALSE)
#> <vctrs_rray<logical>[,1][2]>
#>       
#>        [,1] 
#>   [1,] FALSE
#>   [2,]  TRUE

rray_equal(ex, 0)
#> <vctrs_rray<logical>[,1][2]>
#>       
#>        [,1] 
#>   [1,] FALSE
#>   [2,]  TRUE

# example of broadcasting
rray_equal(ex, matrix(c(TRUE, FALSE, TRUE, TRUE), ncol = 2))
#> <vctrs_rray<logical>[,2][4]>
#>       
#>        [,1]  [,2] 
#>   [1,]  TRUE  TRUE
#>   [2,]  TRUE FALSE
wolfv commented 5 years ago

Hey @DavisVaughan

Nice! The basic ideas are absolutely correct – and it seems to work, that's cool! The changes to rtensor will be trivial, just repeat the same stuff as for rarray, I agree.

There is one question I have from your example: when you do rray(c(TRUE, FALSE)) it should print a 1D vector, right? Or does your package handle this differently?

Besides this, here are some comments:

I think regarding the value type meta function, how about creating a detail namespace for xtensor-r such as

namespace r_detail
{
    template <class T>
    struct get_underlying_value_type_r
    {
        return T;
    };

    template <>
    struct get_underlying_value_type_r<rlogical>
    {
        return int;
    };
}
DavisVaughan commented 5 years ago

Thanks!

Regarding 1D vector, my package does handle this differently (everything is at least a 2D matrix with vectors being 1 col matrices) but I'm tempted to revert this behavior. xtensor was doing the right thing!

The r_detail namespace goes in the xt namespace correct?

Otherwise sounds good and I'll make all the changes and PR

wolfv commented 5 years ago

I see :) I guess your package is young enough so you can easily make these changes.

yeah, r_detail inside the xt namespace :+1:

Cool, looking forward to the PR.