reducer dimension reduction behavior

DavisVaughan commented 5 years ago

While working through some reducer examples, I noticed that they always drop the dimension that you are reducing over. Would it be possible to provide an option to keep that dimension and leave it as size 1? I think this can have some neat implications and use cases. For example, rray_sum() has been implemented to always keep that dimension as size 1. rarray_matrix_sum_cpp() is just the standard version that drops the dimension.

Notice how I can easily do a reduction then a broadcasting operation when the dimensions are kept.

numpy.sum actually has keepdims as an argument. I played around with it, but im not sure it does exactly what im asking for (but im actually not sure). https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html

// [[Rcpp::depends(xtensorrr)]]
// [[Rcpp::plugins(cpp14)]]

#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor-r/rarray.hpp>

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
xt::rarray<double> rarray_matrix_sum_cpp(xt::rarray<double> x, std::vector<std::size_t> axes) {
  xt::rarray<double> res = xt::sum(x, axes);
  return(res);
}

Rcpp::sourceCpp("~/Desktop/test.cpp")

library(rray)

x <- rray(as.double(1:6), c(2,3))
x
#> <vctrs_rray<double>[,3][6]>
#>      [,1] [,2] [,3]
#> [1,] 1    3    5   
#> [2,] 2    4    6

rarray_matrix_sum_cpp(x, 1)
#> [1]  9 12
rarray_matrix_sum_cpp(x, 0)
#> [1]  3  7 11

# this happens to work because i treat 
# a 1D array like a 1 column matrix
x / rarray_matrix_sum_cpp(x, 1)
#> <vctrs_rray<double>[,3][6]>
#>      [,1]      [,2]      [,3]     
#> [1,] 0.1111111 0.3333333 0.5555556
#> [2,] 0.1666667 0.3333333 0.5000000

# this fails
x / rarray_matrix_sum_cpp(x, 0)
#> Error: Incompatible lengths: 2, 3

# Now in rray

# R is 1 index based :P
# so this is axes = 0
# just believe me
rray_sum(x, 1)
#> <vctrs_rray<double>[,3][3]>
#>      [,1] [,2] [,3]
#> [1,]  3    7   11

# What if I wanted to calculate
# column-wise proportions?
x / rray_sum(x, 1)
#> <vctrs_rray<double>[,3][6]>
#>      [,1]      [,2]      [,3]     
#> [1,] 0.3333333 0.4285714 0.4545455
#> [2,] 0.6666667 0.5714286 0.5454545

# row-wise?
x / rray_sum(x, 2)
#> <vctrs_rray<double>[,3][6]>
#>      [,1]      [,2]      [,3]     
#> [1,] 0.1111111 0.3333333 0.5555556
#> [2,] 0.1666667 0.3333333 0.5000000

^{Created on 2018-12-08 by the reprex package (v0.2.1.9000)}

SylvainCorlay commented 5 years ago

:+1: on adding a keepdims argument to the reducers.

DavisVaughan commented 5 years ago

Simply confirming that keepdims does do what I want. This would be great, I'd probably make it the default in my package. 👍

import numpy as np
x = [[1, 1, 2], [1, 5, 2]]
x

[[1, 1, 2], [1, 5, 2]]

x / np.sum(x, axis=0)

array([[0.5       , 0.16666667, 0.5       ],
       [0.5       , 0.83333333, 0.5       ]])

x / np.sum(x, axis=0, keepdims = True)

array([[0.5       , 0.16666667, 0.5       ],
       [0.5       , 0.83333333, 0.5       ]])

x / np.sum(x, axis=1)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-43-4f2eb8eb8ed6> in <module>
----> 1 x / np.sum(x, axis=1)

ValueError: operands could not be broadcast together with shapes (2,3) (2,)

x / np.sum(x, axis=1, keepdims = True)

array([[0.25 , 0.25 , 0.5  ],
       [0.125, 0.625, 0.25 ]])

wolfv commented 5 years ago

Hi Davis,

thanks for your deep investigations here :)

I have a working prototype of keepdims, will integrate it asap. What complicates keepdims (for us C++ people) is that it needs to be a compile-time option since we need to compute the number of dimensions statically.

I'll also investigate your other issues. Not sure how we should correctly handle the scalar problem.

DavisVaughan commented 5 years ago

Thanks for your fast work. Continually impressed with how quickly you guys are integrating things.

I can imagine that compile-time limitation is frustrating. I can only wish you the best of luck with my limited knowledge! Thank you!

wolfv commented 5 years ago

I've begun implementing my ideas for keep_dims here: https://github.com/wolfv/xtensor/tree/reducer_keep_dims

DavisVaughan commented 5 years ago

Closed by QuantStack/xtensor#1474

xtensor-stack / xtensor-r

reducer dimension reduction behavior #75