scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
305 stars 27 forks source link

issues with `separate-columns` #62

Closed behrica closed 2 years ago

behrica commented 2 years ago

While this works:

(tablecloth.api/separate-column (tc/dataset {:a ["1 2 3"]}) :a [:x :y :z] " ")
;; => _unnamed [1 3]:
;;    | :x | :y | :z |
;;    |----|----|----|
;;    |  1 |  2 |  3 |
;;

the next 2 variations don't work:

;; use regexp
(tablecloth.api/separate-column (tc/dataset {:a ["1 2 3"]}) :a [:x :y :z] #"\s{1}")
;; => _unnamed [1 3]:
;;    | :x | :y | :z |
;;    |----|----|----|
;;    |    |    |    |
;;
;; skip list of target columns
(tablecloth.api/separate-column (tc/dataset {:a ["1 2 3"]}) :a " ")
Unhandled java.lang.ClassCastException
class clojure.lang.LazySeq cannot be cast to class java.util.Map
(clojure.lang.LazySeq is in unnamed module of loader 'app'; java.util.Map is
 in module java.base of loader 'bootstrap')

mapseq_colmap.clj:   37  tech.v3.dataset.io.mapseq-colmap/mapseq->dataset/fn
mapseq_colmap.clj:   32  tech.v3.dataset.io.mapseq-colmap/mapseq->dataset
mapseq_colmap.clj:   15  tech.v3.dataset.io.mapseq-colmap/mapseq->dataset
io.clj:  232  tech.v3.dataset.io/->dataset
io.clj:  100  tech.v3.dataset.io/->dataset
io.clj:  237  tech.v3.dataset.io/->dataset
genmeblog commented 2 years ago

Thanks! Maybe it's not clear in tutorial but regex can be used only to catch groups, ie. this should work: #"(\d)\s{1}(\d)\s{1}(\d)". If you want to use regex as a split separator, you can use function #(str/split % #"\s{1}").

I'll check the other case later.

genmeblog commented 2 years ago

Ok, got the second case. It's in doc:

target columns - can be nil or :infer if separator returns map

That means, that you have to somehow provide a names of the columns. Via explicit argument or from a map.

The solution here is to introduce default names.

behrica commented 2 years ago

yes, indeed. I discovered as well that it can be a function. Maybe docu could be improved.

behrica commented 2 years ago

Ok, got the second case. It's in doc:

target columns - can be nil or :infer if separator returns map

What is my version 3) supposed to do ? So the 3-arity function ? I did not find it in docu neither.

And indeed, default column names (column-0 ....) would be useful.

genmeblog commented 2 years ago

Third version should be something like that: #(zipmap [:x :y :z] (str/split % #"\s{1}"))

genmeblog commented 2 years ago

I've just found out that second case can be done with "\\s{1}" (I call re-pattern under the hood)

behrica commented 2 years ago

I mean, what is the purpose of the 3-arity : separate-columns [ds column separator] ? Do you have any example of it ?

genmeblog commented 2 years ago
(tc/separate-column DS :V3 (fn [^double v]
                              {:int-part (int (quot v 1.0))
                               :fract-part (mod v 1.0)}))
genmeblog commented 2 years ago

Sometimes you may want to dynamically decide which column to fill.