ondra-m / ruby-spark

Ruby wrapper for Apache Spark
MIT License
227 stars 29 forks source link

Closure support ? [question] #16

Closed yduf closed 8 years ago

yduf commented 9 years ago

Hi ,

Do you think that automatic serialization of closure could be supported ?

ondra-m commented 9 years ago

What do you mean by "automatic serialization" ?

yduf commented 9 years ago

In current situation (at least that my understanding) when in scala you can write : val a = 2 rdd.map { |x| x + a } # closure, taking into account a= 2 is automatically transfered to worker

to obtain the same thing with ruby-spark, it's necessary to explicitly write: a = 2 Spark.sc.broadcast( a) # rdd = rdd.bind( a: a) # rdd = rdd.map lambda { |x| x + a } # use cluster distributed value

otherwise worker will give an error: org.apache.spark.api.ruby.RubyException: undefined local variable or method `a'

ondra-m commented 9 years ago

Its possible. On ruby 2.2 you can get all local variables with

a = 1
b = Array.new(100_000_000){ rand }

func = lambda { a + 1 }
func.binding.local_variables
# => [..., :a, :b, ...]

Unfortunately, variable b be also serialized.