techascent / tech.ml.dataset

A Clojure high performance data processing system
Eclipse Public License 1.0
680 stars 35 forks source link

`[group-by]` - returned value cannot be destructured as a sequence of key/value pairs #357

Closed harold closed 1 year ago

harold commented 1 year ago

In converting some code that was previously dealing w/ sequences of maps to use a dataset, I hit the following (verily simplified here):

> (->> (group-by :a [{:a 1 :b 1} {:a 2 :b 2}])
       ffirst)
1
> (->> (ds/group-by-column (ds/->dataset [{:a 1 :b 1} {:a 2 :b 2}]) :a)
       ffirst)
Execution error (IllegalArgumentException) at [REDACTED]/eval41575 (form-init7155704788576968663.clj:4025).
Don't know how to create ISeq from: java.util.LinkedHashMap$Entry

My expectation was that the item returned by group-by would be a map from values to datasets filtered on that value. Some downstream code was destructuring them as such and exploded.

Would it be possible to make the value returned by group-by-column behave more like a clojure map in this way?

harold commented 1 year ago

My workaround for the moment:

> (->> (ds/group-by-column (ds/->dataset [{:a 1 :b 1} {:a 2 :b 2}]) :a)
       (into {})
       ffirst)
1
cnuernber commented 1 year ago

also (def fkey (comp key first)) as a work-around

harold commented 1 year ago

I get it - the java.util.LinkedHashMap$Entry isn't a clojure.lang.AMapEntry.

Your workaround is a lot better because it uses less memory.

This is fine for now, will keep any eye out for other cases where it might be helpful to treat the map entries as two element vectors. If those arise can re-think if this is a good idea and how to implement.