sorenmacbeth / flambo

A Clojure DSL for Apache Spark
Eclipse Public License 1.0
607 stars 84 forks source link

Transducers #17

Open pasviegas opened 9 years ago

pasviegas commented 9 years ago

Would be really awesome if the project was transducer compatible.

I know that it would be some extra work, but I think it might be possible.

Any thoughts?

arnaudsj commented 9 years ago

I've had the same thoughts :)

sorenmacbeth commented 9 years ago

Yes, I've absolutely been thinking the same thing. I'm still reading up on transducers, but I agree it makes complete sense to make flambo compatible with them. In the meantime, gist sketches or PRs would be great while we zero in on an implementation :)

pasviegas commented 9 years ago

So, I think something in this direction:

; Creating a transducer
(def process-bags
  (comp (flatmap unbundle-pallet)
        (filter non-food?)
        (map label-heavy)))

; Coll
(into [] process-bags [])
; Batch 
(extend-type JavaRDD
  clojure.core.protocols/CollReduce
  (coll-reduce
    ([rdd f]
     (flambo/reduce rdd f))
    ([rdd f start]
     (flambo/fold rdd start f))))

(into [] process-bags (f/text-file sc "data.txt"))

; Channel
(chan 1 process-bags)
;Stream 
(f/createStream sc process-bags) ;??