Add more structure to the Spark backend

johnynek commented 6 years ago

This follows the MemoryBackend pattern of introducing an Op type that we are planning onto. This Op in spark is basically calling a function with a SparkContext and ExecutionContext to produce a Future of an RDD.

This has the nice property that we don't take the SparkContext when we are planning, only when running.

Secondly, I filled in the other missing stuff: the SparkWriter, which manages writes as we evaluate Executions, and also the mapping of sources and sinks.

In I think 2-3 following PRs we can finish:

implement the writer
finish the planner

Note, the writer and planner implementation work can go on in parallel. So I can just fork myself and finish faster.

johnynek commented 6 years ago

@fwbrasil @ianoc can you take a look?

ianoc commented 6 years ago

not sure what start and finish are for in the writer trait. but lgtm

twitter / scalding

Add more structure to the Spark backend #1844