Easier serialization for Spark

rjagerman / glint

Glint: High performance scala parameter server

MIT License

168 stars 62 forks source link

It is not immediately obvious how to use the implicit execution context and timeout together with Spark. In particular, the following piece of code will not run:

implicit val ec = ExecutionContext.Implicits.global
implicit val timeout = new Timeout(30 seconds)
rdd.foreach {
    case value => vector.push(Array(0L), Array(value))
}

Spark attempts to serialize both the execution context and timeout which causes errors because these objects are never meant to be serialized. Instead, one would have to write something like this:

rdd.foreach {
    case value =>
      implicit val ec = ExecutionContext.Implicits.global
      implicit val timeout = new Timeout(30 seconds)
      vector.push(Array(0L), Array(value))
}

To make this easier, it might be a good idea to either remove the implicits and make these configurable defaults or to add new methods that don't have the implicits as a requirement.

rjagerman / glint

Easier serialization for Spark #38