scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
277 stars 18 forks source link

let-dataset .. and more ? #111

Open awb99 opened 10 months ago

awb99 commented 10 months ago

Just found out about let-dataset ... great feature. Now perhaps you want to expa d let-dataset by adding add-colums-let .. same macro but it will get a dataset to operate on as first parameter ... the second parameter is unchanged (the vector of let bindings). What it will do is create a binding of all column names (so keyword to symbol mapping) so that they can be used in the binding. And second it will add all binding names to columns (similar to how it is done in let-dataset

; make ds1 .. with x y z columns (def ds1 (tc/let-dataset [x (range 1 6) y 1 z (dfn/+ x y)]))

; add a column to ds1 ; note x y are the :x :y columns in the dataset (tc/add-columns-let ds1 [a (dfn/+ x y)])

awb99 commented 10 months ago

This is my current approach:

https://github.com/clojure-quant/techml.vector-math/blob/main/test/syntax.clj

(s/calc d [x (+ a b) y (+ x c) z [y 1] ]) This is the macro that adds bindings to all columns in the dataset: https://github.com/clojure-quant/techml.vector-math/blob/main/src/cquant/vmath/syntax/column.clj

awb99 commented 10 months ago

My goal is to be able to enter vector math in a format that has identical syntax to a scalar only math. So ( a b) in vector mode means (let [a (:a ds1) b (:b ds1)] (dfn/ a b)) or in scalar mode just (let [a 1 b 2] (* a b)). I feel such math is better placed in tablecloth.

genmeblog commented 10 months ago

Generally there is a subproject by @ezmiller which lifts all columnar / vector operations to its own namespace.

I think there is a space for a macro you've proposed. The only change would be explicit column accessor, column name can be anything, not only keywords.

(tc/let-add-columns ds [a :a b :b z (dfn/+ a b)])
genmeblog commented 10 months ago

or maybe add one more arity to let-dataset?