zero-one-group / geni

A Clojure dataframe library that runs on Spark
Apache License 2.0
281 stars 28 forks source link

Fix bug where g/->dataset doesn't work for more than 8 columns #340

Closed WaqasAliAbbasi closed 2 years ago

WaqasAliAbbasi commented 2 years ago

Info

Info Value
Geni Version 0.0.39

Problem / Steps to reproduce

Many apologies, I introduced a bug in https://github.com/zero-one-group/geni/pull/336 where g/->dataset loses order of the values when there are more than 8 columns:

(-> (g/records->dataset
     @tr/spark
     [{:a 1  :b 2  :c 3  :d 4  :e 5  :f 6  :g 7  :h 8  :i 9}
      {:a 10 :b 11 :c 12 :d 13 :e 14 :f 15 :g 16 :h 17 :i 18}])
    g/collect)

leads to

;({:e 9, :g 3, :c 1, :h 5, :b 2, :d 6, :f 4, :i 7, :a 8}
; {:e 18, :g 12, :c 10, :h 14, :b 11, :d 15, :f 13, :i 16, :a 17})

Expected Result

g/collect should return the same output as the one we provided to g/->dataset

Proposed Solution

Dont use zipmap in https://github.com/zero-one-group/geni/blob/b7323bdb399611323b66c924dcb1098f36012a2a/src/clojure/zero_one/geni/core/dataset_creation.clj#L238

Background

anthony-khong commented 2 years ago

I'll make a new release by this week!

WaqasAliAbbasi commented 2 years ago

I'll make a new release by this week!

Thanks 🥳

WaqasAliAbbasi commented 2 years ago

I'll make a new release by this week!

Hi Anthony, any ETA on this?

anthony-khong commented 2 years ago

Hi @WaqasAliAbbasi, 0.0.40 should be up on Clojars: https://clojars.org/zero.one/geni. Would that be sufficient for you?

The actual GitHub release and merging the changes upstream will come later!

WaqasAliAbbasi commented 2 years ago

Hi @WaqasAliAbbasi, 0.0.40 should be up on Clojars: https://clojars.org/zero.one/geni. Would that be sufficient for you?

The actual GitHub release and merging the changes upstream will come later!

Yep that's perfect, thank you so much! 😃