IScala Spark Backend

Introduction

The Spark large scale data processing framework provides a REPL which allows for interactive exploration, analysis and manipulation of large amounts of data. This PR adds this Spark backend to IScala. The combination of iScala's/iPythons ease of use and visualization capabilites and Spark's processing power should be incredibly useful. This work is partially inspired and based upon ISpark.

Design & Implementation

One of the major hurdles in creating the Spark backend is the fact that the Scala and Spark IMain classes do not have a shared interface. So in order to get this to work, a (thin) common interface (IMainBackend) was introduced for both IMain classes and the Interpreter now uses this common interface.
The interpreter class has been modified in order to make it the common gateway for IScala interpreter operations (this intead of accessing an IMain instance directly).
An interpreter factory option (--interp) has been added to the options. This interpreter factory helps the IScala class construct the Interpreter.
Running

To start IPython with IScala Spark backend manually, issue:

ipython notebook --KernelManager.kernel_cmd='["$SPARK_HOME/dist/bin/spark-submit", "--master", "$MASTER", "--driver-memory", "2G", "--class", "org.refptr.iscala.IScala", "$ISCALA/IScala.jar", "--connection-file", "{connection_file}", "--parent", "--interp", "org.refptr.iscala.SparkInterpreterFactory"]'

Replace $SPARK_HOME by the location of the spark distribution, $MASTER by the master URL of the Spark master node and $ISCALA by the location of the IScala jar.

NB: Spark is currently Scala 2.10 only!

TODO's

Created (temporary) SBT logger which doesn't use JLine. The JLine version used in Spark collides with the JLine version used by SBT...
Reset does not work in Spark. This is because of a bind problem in Spark.

mattpap / IScala

Spark backend for IScala #21

IScala Spark Backend

Introduction

Design & Implementation

Running

TODO's