scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
288 stars 23 forks source link

type inference when adding a constant column #10

Closed daslu closed 3 years ago

daslu commented 3 years ago

Currently (checked with https://github.com/scicloj/tablecloth/commit/a067803de29daf58240f720014cd02b6fa4bca46), when adding a constant column, it gets the :object type:

(-> {:x [1 2]}
    (tablecloth.api/dataset)
    (tablecloth.api/add-or-replace-column
     :y 1)
    :y)

#tech.v3.dataset.column<object>[2]
:y
[1, 1, ]

It is possible to infer a more specific type from the single value in the column. One way to do it explicitly is this:

(-> {:x [1 2]}
    (tablecloth.api/dataset)
    (tablecloth.api/add-or-replace-column
     :y (tech.v3.datatype/as-reader [1]))
    :y)

#tech.v3.dataset.column<int64>[2]
:y
[1, 1, ]

Probably it could be good to apply such logic implicitly, by default.

genmeblog commented 3 years ago

It happens here: https://github.com/scicloj/tablecloth/blob/master/src/tablecloth/api/columns.clj#L116

The best is to create a constant reader for this case instead of clojure seq or reader from seq.

daslu commented 3 years ago

Thanks, you're right.

Here is one way to create that constant reader.


(-> {:x [1 2]}
    (tablecloth.api/dataset)
    (tablecloth.api/add-or-replace-column
     :y (tech.v3.datatype/make-reader
         (tech.v3.datatype/elemwise-datatype 1) ; figure out the type of our constant
         2 ; the number of rows of the dataset we are adding to
         1 ; the constant
         ))
    :y)

#tech.v3.dataset.column<int64>[2]
:y
[1, 1, ]
genmeblog commented 3 years ago

Great. There is also const-reader I'll probably use.

genmeblog commented 3 years ago
(-> {:x [1 2]}
    (ds/->dataset)
    (add-or-replace-column :y 1)
    :y)
;; => #tech.v3.dataset.column<int64>[2]
;;    :y
;;    [1, 1, ]

(-> {:x [1 2]}
    (ds/->dataset)
    (add-or-replace-column :y 1.0)
    :y)
;; => #tech.v3.dataset.column<float64>[2]
;;    :y
;;    [1.000, 1.000, ]

(-> {:x [1 2]}
    (ds/->dataset)
    (add-or-replace-column :y "aaa")
    :y)
;; => #tech.v3.dataset.column<string>[2]
;;    :y
;;    [aaa, aaa, ]