mozilla / overscripted

Repository for the Mozilla Overscripted Data Mining Challenge
Mozilla Public License 2.0
75 stars 53 forks source link

Changes in hello_world.ipynb (Spark Context) #44

Open asquare14 opened 5 years ago

asquare14 commented 5 years ago

While I was going through hello_world.ipynb, I noticed this error ValueError: Cannot run multiple SparkContexts at once. It is a pretty common error that occurs because the system automatically initializes the SparkContex.

I had to use sc.stop() to stop the earlier context and create a new one. @birdsarah Should I maybe add a cell just after this code snippet

import findspark findspark.init('/opt/spark') # Adjust for the location where you installed spark from pyspark import SparkContext from pyspark.sql import SparkSession sc = SparkContext(appName="Overscripted") spark = SparkSession(sc)

#If you are already running a context. run this cell and rerun the cell above sc.stop()

birdsarah commented 5 years ago

Feel free to open a PR and I'll get someone who uses spark more than me to take a look.

I do almost all my data analysis on this dataset using dask - dask.pydata.org

I'll share a notebook example tomorrow.