rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
456 stars 306 forks source link

Cant run pyspark_mongodb.py #116

Closed hszkf closed 4 years ago

hszkf commented 4 years ago

I used spark-submit pyspark_mongodb.py and the error was : NameError: name 'sc' is not defined

But when I opened the terminal and got into pyspark environment by using pyspark, it worked. And the problem for this was I wanted to save the data to mongodb schema_data.saveToMongoDB(...), the error came out suddenly.

I dont think the solution for this is pip install pyspark because hadoop-spark is already installed, Im afraid something out of hands.

rjurney commented 4 years ago

You’re going to need to buy the book, I suspect. Use the command pyspark and paste in the code from the file.

hszkf commented 4 years ago

Yes the book is with me now. Just realized I need to setup all the mongo hadoop path before executing the savetomongo script. Thanks!

rjurney commented 4 years ago

Oh hey - one thing! Check out the Jupyter notebooks. You should be able to visit http://localhost:8888 and see a notebook per chapter. Use those, they were created after the book.

hszkf commented 4 years ago

Perfect, it was hard to start but will adapt soon as hadoop ecosystem is totally blowing my mind now. Thanks for the great course/book and being helpful enough in replying on the issues!