rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
456 stars 306 forks source link

Fix pyspark streaming py #89

Closed pjhinton closed 5 years ago

pjhinton commented 5 years ago

fix script by adding call to stop default SparkContext

This commit fixes an error that is observed in the book version of the code that occurs when a new SparkContext is created.

>>> sc = SparkContext(
... appName = "Agile Data Science: PySpark Streaming 'Hello, world!'", conf=conf
... )
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/vagrant/spark/python/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/home/vagrant/spark/python/pyspark/context.py", line 299, in _ensure_initialized
callsite.function, callsite.file, callsite.linenum))
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /home/vagrant/spark/python/pyspark/shell.py:45

This can be fixed by stopping the default SparkContext that is created during startup. This is proably a better fix than just commenting out the instantiation of the SparkContext because it ensures the SparkConf is used.

rjurney commented 5 years ago

Thanks!