r-lib / rray

Simple Arrays
https://rray.r-lib.org
GNU General Public License v3.0
130 stars 12 forks source link

rray_nest #111

Open juangomezduaso opened 5 years ago

juangomezduaso commented 5 years ago

There are few functions I had think of for an array toolkit that are not covered or exceeded in rray. Among then there is the nesting of some dimensions into another, to yield an array with less dimensionality. Doing rray_nest(my5Drray, c(3,2,4) ) we would nest dimensions 2and 4 behind dim 3 getting a 3D rray with: dim 1, then the nesting of 3,2,4 and then dim 5 Dimnames can be kept as much as possible with a naming paste convention This is related to the ideas of issue 19, and this nesting function could be helpfull in the implementation of those as well. Here is a dummy implementation for you to see what I mean:

library(rray)
library(vctrs)
rray_nest <- function(x, axes){
  stopifnot(all(axes <= vec_dims(x)))
  stopifnot(unique(axes)== axes)
  head <-setdiff(1:axes[[1]], axes)
  tail <- setdiff(axes[[1]]:vec_dims(x), axes)
  res <- rray_reshape( rray_transpose(x,c(head,rev(axes),tail)), c(vec_dim(x)[head],prod(vec_dim(x)[axes]),vec_dim(x)[tail])) 
  dimnames(res) <- c(dimnames(x)[head], list(mixnames(dimnames(x)[axes])), dimnames(x)[tail])
  names(dimnames(res))[[ length(head)+1 ]] <- rray:::reduce(names(dimnames(x))[ axes ], function(x,y) paste(y,x,sep="."))
  res
}
# Aux function to paste names
mixnames <- function(dimnames){
  rray:::reduce(expand.grid(rev(dimnames),KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE),
                function(x,y) paste(y,x, sep=".")) 
}
mixnames(list(A=LETTERS[1:3], letters[4:5]))
#> [1] "A.d" "A.e" "B.d" "B.e" "C.d" "C.e"

myrr=rray(1:24,2:4,list(A=c("a1", "a2"), B=c("b1", "b2", "b3"),C=c( "c1", "c2", "c3", "c4") ))
rray_nest(myrr, 1:2)
#> <vctrs_rray<integer>[,4][24]>
#>        C
#> B.A     c1 c2 c3 c4
#>   a1.b1  1  7 13 19
#>   a1.b2  3  9 15 21
#>   a1.b3  5 11 17 23
#>   a2.b1  2  8 14 20
#>   a2.b2  4 10 16 22
#>   a2.b3  6 12 18 24
rray_nest(myrr, 2:1)
#> <vctrs_rray<integer>[,4][24]>
#>        C
#> A.B     c1 c2 c3 c4
#>   b1.a1  1  7 13 19
#>   b1.a2  2  8 14 20
#>   b2.a1  3  9 15 21
#>   b2.a2  4 10 16 22
#>   b3.a1  5 11 17 23
#>   b3.a2  6 12 18 24
rray_nest(myrr, c(3,1))
#> <vctrs_rray<integer>[,8][24]>
#>     A.C
#> B    c1.a1 c1.a2 c2.a1 c2.a2 c3.a1 c3.a2 c4.a1 c4.a2
#>   b1     1     2     7     8    13    14    19    20
#>   b2     3     4     9    10    15    16    21    22
#>   b3     5     6    11    12    17    18    23    24
identical(rray_nest(myrr, 3), myrr)
#> [1] TRUE

Created on 2019-04-24 by the reprex package (v0.2.1)

As you can see, there is nothing that can't be achieved with transpose and reshape, but gives users another tool tailored to this concrete basic array transformation. In particular , reducing an array to 2D can be a form of visualization or tabulation of agregations made with tapply or similar array returning functions

DavisVaughan commented 5 years ago

I was thinking about this kind of operation a few days ago. I probably won't call it rray_nest() as I'm not sure it has exactly the same semantics, but this example in particular of smushing the 3rd dimension into the 2nd dimension feels useful.

library(rray)
library(vctrs)
rray_nest <- function(x, axes){
  stopifnot(all(axes <= vec_dims(x)))
  stopifnot(unique(axes)== axes)
  head <-setdiff(1:axes[[1]], axes)
  tail <- setdiff(axes[[1]]:vec_dims(x), axes)
  res <- rray_reshape( rray_transpose(x,c(head,axes,tail)), c(vec_dim(x)[head],prod(vec_dim(x)[axes]),vec_dim(x)[tail])) 
  dimnames(res) <- c(dimnames(x)[head], list(mixnames(dimnames(x)[axes])), dimnames(x)[tail])
  names(dimnames(res))[[ axes[[1]] ]] <- rray:::reduce(names(dimnames(x))[ axes ], function(x,y) paste(x,y,sep="."))
  res
}
# Aux function to paste names
mixnames <- function(dimnames){
  rray:::reduce(expand.grid(dimnames,KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE),
                function(x,y) paste(x,y, sep=".")) 
}
mixnames(list(A=LETTERS[1:3], letters[4:5]))
#> [1] "A.d" "B.d" "C.d" "A.e" "B.e" "C.e"

myrr=rray(1:24,2:4,list(
  A=c("a1", "a2"),
  B=c("b1", "b2", "b3"),
  C=c( "c1", "c2", "c3", "c4") ))

myrr
#> <rray<int>[,3,4][24]>
#> , , C = c1
#> 
#>     B
#> A    b1 b2 b3
#>   a1  1  3  5
#>   a2  2  4  6
#> 
#> , , C = c2
#> 
#>     B
#> A    b1 b2 b3
#>   a1  7  9 11
#>   a2  8 10 12
#> 
#> , , C = c3
#> 
#>     B
#> A    b1 b2 b3
#>   a1 13 15 17
#>   a2 14 16 18
#> 
#> , , C = c4
#> 
#>     B
#> A    b1 b2 b3
#>   a1 19 21 23
#>   a2 20 22 24

rray_nest(myrr, c(2, 3))
#> <rray<int>[,12][24]>
#>     B.C
#> A    b1.c1 b2.c1 b3.c1 b1.c2 b2.c2 b3.c2 b1.c3 b2.c3 b3.c3 b1.c4 b2.c4
#>   a1     1     3     5     7     9    11    13    15    17    19    21
#>   a2     2     4     6     8    10    12    14    16    18    20    22
#>     B.C
#> A    b3.c4
#>   a1    23
#>   a2    24

Created on 2019-04-24 by the reprex package (v0.2.1.9000)

DavisVaughan commented 5 years ago

hmm actually both of these examples are just versions of rray_reshape() that have special dimension name handling.

library(rray)
library(vctrs)

myrr=rray(1:24,2:4,list(
  A=c("a1", "a2"),
  B=c("b1", "b2", "b3"),
  C=c( "c1", "c2", "c3", "c4") ))

# first example
rray_reshape(myrr, c(6, 4))
#> <rray<int>[,4][24]>
#>       B
#> A      [,1] [,2] [,3] [,4]
#>   [1,]    1    7   13   19
#>   [2,]    2    8   14   20
#>   [3,]    3    9   15   21
#>   [4,]    4   10   16   22
#>   [5,]    5   11   17   23
#>   [6,]    6   12   18   24

# second example
rray_reshape(myrr, c(2, 12))
#> <rray<int>[,12][24]>
#>     B
#> A    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#>   a1    1    3    5    7    9   11   13   15   17    19    21    23
#>   a2    2    4    6    8   10   12   14   16   18    20    22    24

Created on 2019-04-24 by the reprex package (v0.2.1.9000)

juangomezduaso commented 5 years ago

There could also be the corresponding unnest() function, whose only merit would be to be able to do a reshape based on just the names, or take care of them properly, when they follow the conventional structure created in rray_nest().

These functions get their usefullness in the dimnames management and in their natural and meaningfull intent (compared with the more abstract reshape).

nest() in general is a kind of mix of reshape and transpose. Thinking a litlle bit more on it, I belive it should admit a list in argument axes, to allow more than one nesting at a time. This would do the same as a sucesion of individual nestings, but has two nice advantages IMO: a) the axes are specified in terms of the current dimensions. So, for instance, to view my5Drray as a table, is easier to ask: my5Drray %>% rray_nest(list(c(1,5,3),c(4,2))) # first, fifth then third as rows, one four and second nested in columns than: my5Drray %>% rray_nest(c(1,5,3)) %>% ray_nest(c(3,2))

b) We could also use metadimnames and even give metadimnames to the "new" dimensions: rray_nest(my5Drray, list(Time=c("Year","Quarter"), Var=c("Region", "Sex")))

juangomezduaso commented 5 years ago

If we forget about the names, it is interesting to think of these 4 functions in "mathematical useless speculation mode" (please stop reading here if uninterested in loosing your time):

The sets A={Reshape, Transpose} and B={Nest, Unnest} have a kind of symetric duality. They have the same power and either one can implement the other: Nest is just a transpose followed by a reshape. Unnest is just a particular case of reshape. Conversely: any Reshape can be done by a nesting of contiguous dimensions (melting the whole dimensions set if necesary) followed by an appropiate separation with Unnest. And any transpose can be achieved by some nestings followed by the corresponding unnestings.

In fact the 3 primitive operations in this ¿dimensions algebra? could be: T:= Transpose G:= Gruping of contiguous dimensions U:= Unnest (ungrouping to contiguos dimensions) So in each of the two function sets we have a basic operation (T in A, U in B ) and a "powerfull" but not pure one that provides the two lacking ingredients: Reshape ( mixing G and U) in set A and Nest (mixing T and G) in set B

juangomezduaso commented 5 years ago

Oh, the above function was completely wrong! The examples didnt fail and I shamefully asumed all was well! I hope you understood the intention without looking at it too much! It is now correct (or at least nearer to being)

juangomezduaso commented 5 years ago

´rray_reshape(myrr, c(6, 4))´

Yes, it gets the same result (disregarding names) in this example, but you will probably agree that to be fair a comparison when we scale to real life examples, it should rather be: rray_reshape(myrr, c(prod(dim(myrr)[1:2]) , 4)) if you dont remember the exact numbers of the nested dims, or at least: rray_reshape(myrr, c(2*3, 4)) if you do remember those (potentially high) numbers. But, for users like me, you'd better not rely on our ability to do multiplications ;)