Closed yduf closed 8 years ago
What do you mean by "automatic serialization" ?
In current situation (at least that my understanding) when in scala you can write : val a = 2 rdd.map { |x| x + a } # closure, taking into account a= 2 is automatically transfered to worker
to obtain the same thing with ruby-spark, it's necessary to explicitly write: a = 2 Spark.sc.broadcast( a) # rdd = rdd.bind( a: a) # rdd = rdd.map lambda { |x| x + a } # use cluster distributed value
otherwise worker will give an error: org.apache.spark.api.ruby.RubyException: undefined local variable or method `a'
Its possible. On ruby 2.2 you can get all local variables with
a = 1
b = Array.new(100_000_000){ rand }
func = lambda { a + 1 }
func.binding.local_variables
# => [..., :a, :b, ...]
Unfortunately, variable b be also serialized.
Hi ,
Do you think that automatic serialization of closure could be supported ?