scicloj / clojisr

Clojure speaks statistics - a bridge between Clojure to R
https://scicloj.github.io/clojisr/
Eclipse Public License 2.0
147 stars 9 forks source link

special handling of "tensor" ? #102

Open behrica opened 1 month ago

behrica commented 1 month ago

It would be nice, if this would work:

(require '[tech.v3.dataset :as ds]
         '[tech.v3.dataset.tensor :as dst])

(def tensor
  (-> (ds/->dataset {:x (range 5)
                     :y (range 7 12)})
      (dst/dataset->tensor)
      ))

and then

(r.base/t (r/clj->r tensor))

(It does something, but not the right thing)

[[1]]
[1] 0 7

[[2]]
[1] 1 8

[[3]]
[1] 2 9

[[4]]
[1]  3 10

[[5]]
[1]  4 11

clj꞉clojisr.v1.r-test꞉> 
     [,1]      [,2]      [,3]      [,4]      [,5]     
[1,] numeric,2 numeric,2 numeric,2 numeric,2 numeric,2

or even better, that this does the right thing:

(r.base/t tensor)

I think we have similar special handling for tech.v3.datasets, maybe we should do the same for tech.v3.tensor

behrica commented 1 month ago

these don't work neither:

(r.base/matrix (r/clj->r tensor))
(r.base/matrix tensor)
behrica commented 1 month ago

For reference, it does work like this:

(require '[tech.v3.tensor :as tens]
         '[tech.v3.datatype :as dtt])

(-> tensor
    tens/tensor->buffer
    (r.base/matrix 
     :nrow (first (dtt/shape tensor))
     :ncol (second (dtt/shape tensor))
     )
    r.base/t
    )
behrica commented 1 month ago

Some remarks:

genmeblog commented 1 month ago

Transfer from Clojure to R should go through Java RServe library structures which I believe is an optimal route. Here is how it's done for TMD: https://github.com/scicloj/clojisr/blob/master/src/clojisr/v1/impl/clj_to_java.clj#L32-L48

Possible it can be done similarly for tensors as well.

genmeblog commented 1 month ago

Tensors in R can be represented as multidimensional arrays not matrices. Here is something done in the past (it's a transfer of flat data into 5d array): https://scicloj.github.io/clojisr/clojisr.v1.tutorials.dataset.html#matrices-arrays-multidimensional-arrays

genmeblog commented 1 month ago

Multidimensional arrays / tables in R are represented as flatten dataset on the Clojure side, like this 3d table: https://scicloj.github.io/clojisr/clojisr.v1.tutorials.dataset.html#table

behrica commented 1 month ago

Ok. I learned indeed that dtype tensors can in R be represented as matrix or array Doing 'class` on a 3 D array in R gives;

> class(array(1:(3 * 4 * 5),dim=(c(3,4,5))))
[1] "array"
> 

while on "matrix" it gives:

> class(matrix(c(1,2,3,4)))
[1] "matrix" "array" 

Using "array" on 2D data gives as well a matrix:

> class(array(1:(3 * 4),dim=(c(3,4))))
[1] "matrix" "array" 
genmeblog commented 1 month ago

Take a look at this line and below which converts multidimensional structure to flattened dataset. We can add another path to create tensors out of arrays. https://github.com/scicloj/clojisr/blob/master/src/clojisr/v1/impl/java_to_clj.clj#L94

behrica commented 1 month ago

yes, will do. To me this is specially unexpected / could be improved by return a proper tensor

(->
 (r.base/array (range (* 3 4 5)) :dim [3 4 5])
 (r/r->clj)
 )
;; => _unnamed [15 5]:
;;    
;;    | :$col-0 |  1 |  2 |  3 |  4 |
;;    |--------:|---:|---:|---:|---:|
;;    |       1 |  0 |  3 |  6 |  9 |
;;    |       1 |  1 |  4 |  7 | 10 |
;;    |       1 |  2 |  5 |  8 | 11 |
;;    |       2 | 12 | 15 | 18 | 21 |
;;    |       2 | 13 | 16 | 19 | 22 |
;;    |       2 | 14 | 17 | 20 | 23 |
;;    |       3 | 24 | 27 | 30 | 33 |
;;    |       3 | 25 | 28 | 31 | 34 |
;;    |       3 | 26 | 29 | 32 | 35 |
;;    |       4 | 36 | 39 | 42 | 45 |
;;    |       4 | 37 | 40 | 43 | 46 |
;;    |       4 | 38 | 41 | 44 | 47 |
;;    |       5 | 48 | 51 | 54 | 57 |
;;    |       5 | 49 | 52 | 55 | 58 |
;;    |       5 | 50 | 53 | 56 | 59 |
behrica commented 1 month ago

it represents a R 3D arrays as 2 2D data frame, (with an extra column per dimension)

genmeblog commented 1 month ago

Yes, that was the idea. To make any nd-array into 2d dataset. I know this is not perfect solution. In that time tensors weren't available (or I was not aware of it)

behrica commented 1 month ago

Yes, that was the idea. To make any nd-array into 2d dataset. I know this is not perfect solution. In that time tensors weren't available (or I was not aware of it)

I see, I started a discussion in zulip , lets continue there.