vatlab / sos-r

SoS extension for R
BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

Support for Multidimensional array conversion #1

Closed HenryLeongStat closed 6 years ago

HenryLeongStat commented 6 years ago

Currently, a 3 dimensions array in R arr = array(1, dim=c(2,2,2)) would be converted as one dimension list in SoS [1, 1, 1, 1, 1, 1, 1, 1]

HenryLeongStat commented 6 years ago

This module (rpy2) might be helpful.

HenryLeongStat commented 6 years ago

rpy2 seems too troublesome to use, somehow I cannot use it on PyCharm but it works fine on Spyder. Will solve this problem without using any package and module.

BoPeng commented 6 years ago

The problem is that in the code we only checked length, not dim of R array, and used paste(... collapse) which collapsed n-d array into a 1-d array.

What we should do is

  1. Check dim,
  2. If is 1-d, use 1-d array (NOT the list we are using now).
  3. If is multi-dimension, construct a string like the following
    np.array([1, 2, 3, 4, 5, 6]).reshape([2,3])

    and pass to Python.

Note that

  1. Here the [1,2,3] part is the 1-d array we already have
  2. shape should be the dim
  3. step 2, 3 can be united (reshape to 1-d is acceptable).
  4. We should check if R and Python numpy use the same memory layout (by row or by column), otherwise the array returned will be wrong.
BoPeng commented 6 years ago

So the solution could be as simple as

 paste("numpy.array([", paste(obj, collapse=','), "]).reshape(", ..py.repr(dim(obj)), ")" )
HenryLeongStat commented 6 years ago

Yes, but another problem is that, for example: in R: The output of

arr = array(c(1:16), dim=c(2,2,2,2))

is

, , 1, 1

     [,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2, 1

     [,1] [,2]
[1,]    5    7
[2,]    6    8

, , 1, 2

     [,1] [,2]
[1,]    9   11
[2,]   10   12

, , 2, 2

     [,1] [,2]
[1,]   13   15
[2,]   14   16

However, in Python, the output of

np.array([ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ]).reshape([2,2,2,2])

is

array([[[[ 1,  2],
         [ 3,  4]],

        [[ 5,  6],
         [ 7,  8]]],

       [[[ 9, 10],
         [11, 12]],

        [[13, 14],
         [15, 16]]]])

It is the problem of expanding by T or F in each 2D matrix in R.

I try

np.array([ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ]).reshape([2,2,2,2],order='F')

but the result would be:

array([[[[ 1,  9],
         [ 5, 13]],

        [[ 3, 11],
         [ 7, 15]]],

       [[[ 2, 10],
         [ 6, 14]],

        [[ 4, 12],
         [ 8, 16]]]])
BoPeng commented 6 years ago

I think you need to use some tricks introduced here.

BoPeng commented 6 years ago

Could you try the order parameter of numpy.reshape?

HenryLeongStat commented 6 years ago

I tried, order='F' is the result showed above. The results of C and A are the same in this situation.

BoPeng commented 6 years ago

I do not get it, which one is the R style order?

HenryLeongStat commented 6 years ago

R style is like the followings:

> arr = array(c(1:120), dim=c(2,3,4,5))
> arr
, , 1, 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2, 1

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3, 1

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4, 1

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24

, , 1, 2

     [,1] [,2] [,3]
[1,]   25   27   29
[2,]   26   28   30

, , 2, 2

     [,1] [,2] [,3]
[1,]   31   33   35
[2,]   32   34   36

, , 3, 2

     [,1] [,2] [,3]
[1,]   37   39   41
[2,]   38   40   42

, , 4, 2

     [,1] [,2] [,3]
[1,]   43   45   47
[2,]   44   46   48

, , 1, 3

     [,1] [,2] [,3]
[1,]   49   51   53
[2,]   50   52   54

, , 2, 3

     [,1] [,2] [,3]
[1,]   55   57   59
[2,]   56   58   60

, , 3, 3

     [,1] [,2] [,3]
[1,]   61   63   65
[2,]   62   64   66

, , 4, 3

     [,1] [,2] [,3]
[1,]   67   69   71
[2,]   68   70   72

, , 1, 4

     [,1] [,2] [,3]
[1,]   73   75   77
[2,]   74   76   78

, , 2, 4

     [,1] [,2] [,3]
[1,]   79   81   83
[2,]   80   82   84

, , 3, 4

     [,1] [,2] [,3]
[1,]   85   87   89
[2,]   86   88   90

, , 4, 4

     [,1] [,2] [,3]
[1,]   91   93   95
[2,]   92   94   96

, , 1, 5

     [,1] [,2] [,3]
[1,]   97   99  101
[2,]   98  100  102

, , 2, 5

     [,1] [,2] [,3]
[1,]  103  105  107
[2,]  104  106  108

, , 3, 5

     [,1] [,2] [,3]
[1,]  109  111  113
[2,]  110  112  114

, , 4, 5

     [,1] [,2] [,3]
[1,]  115  117  119
[2,]  116  118  120

The first two parameters in c(2,3,4,5) are # of rows and # of columns. It expands by F. And the tensor stacks by the other parameters in c(2,3,4,5). (For example, here is 4x5)

However, in numpy, for np.array([ 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120 ]).reshape([2,3,4,5])

The out put is

array([[[[  1,   2,   3,   4,   5],
         [  6,   7,   8,   9,  10],
         [ 11,  12,  13,  14,  15],
         [ 16,  17,  18,  19,  20]],

        [[ 21,  22,  23,  24,  25],
         [ 26,  27,  28,  29,  30],
         [ 31,  32,  33,  34,  35],
         [ 36,  37,  38,  39,  40]],

        [[ 41,  42,  43,  44,  45],
         [ 46,  47,  48,  49,  50],
         [ 51,  52,  53,  54,  55],
         [ 56,  57,  58,  59,  60]]],

       [[[ 61,  62,  63,  64,  65],
         [ 66,  67,  68,  69,  70],
         [ 71,  72,  73,  74,  75],
         [ 76,  77,  78,  79,  80]],

        [[ 81,  82,  83,  84,  85],
         [ 86,  87,  88,  89,  90],
         [ 91,  92,  93,  94,  95],
         [ 96,  97,  98,  99, 100]],

        [[101, 102, 103, 104, 105],
         [106, 107, 108, 109, 110],
         [111, 112, 113, 114, 115],
         [116, 117, 118, 119, 120]]]])

I don't know how reshape() arrange the parameters. Still trying to figure out.

But seems np.swapaxes() would help.

HenryLeongStat commented 6 years ago

Now it's turn for numpy.array() of which dimensions are larger than 3 to R

HenryLeongStat commented 6 years ago

Should be done. Pretty interesting difference between Python numpy.ndarray and R array!

BoPeng commented 6 years ago

Yes, the code looks scary. Thanks!

Actually, instead of a table showing all the type maps, could you more or less copy the tests to the documentation and show how the variables are transferred? Something like

(in SoS)
create variables in SoS
(in R)
%get ... --from R
%preview -n vars
(in SoS)
%get ...
%preview -n vars

I think this will give users a much better idea how the variables are transferred.

BoPeng commented 6 years ago

Also, sos-r test fails and need to be fixed.

HenryLeongStat commented 6 years ago

Sure! Will update the documentation and fix the test! 😄

HenryLeongStat commented 6 years ago

Test is fixed. An one dimension numpy.array in SoS would be converted to one dimension array in R now.

BoPeng commented 6 years ago

Thanks. I will release sos-r once it passes travis CI.

BoPeng commented 6 years ago

Done.