mattpap / IScala

Scala backend for IPython
MIT License
322 stars 50 forks source link

Spark backend for IScala #21

Closed hvanhovell closed 9 years ago

hvanhovell commented 9 years ago

IScala Spark Backend

Introduction

The Spark large scale data processing framework provides a REPL which allows for interactive exploration, analysis and manipulation of large amounts of data. This PR adds this Spark backend to IScala. The combination of iScala's/iPythons ease of use and visualization capabilites and Spark's processing power should be incredibly useful. This work is partially inspired and based upon ISpark.

Design & Implementation

To start IPython with IScala Spark backend manually, issue:

ipython notebook --KernelManager.kernel_cmd='["$SPARK_HOME/dist/bin/spark-submit", "--master", "$MASTER", "--driver-memory", "2G", "--class", "org.refptr.iscala.IScala", "$ISCALA/IScala.jar", "--connection-file", "{connection_file}", "--parent", "--interp", "org.refptr.iscala.SparkInterpreterFactory"]'

Replace $SPARK_HOME by the location of the spark distribution, $MASTER by the master URL of the Spark master node and $ISCALA by the location of the IScala jar.

NB: Spark is currently Scala 2.10 only!

TODO's