rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
456 stars 307 forks source link

Error "No module named 'pyspark'" #53

Closed tsyork closed 7 years ago

tsyork commented 7 years ago

I am running through the examples on the EC2 instance and have successfully run all the example code up to the section entitled Pushing data to MongoDB from PySpark. This includes successfully running pyspark from the command line.

When I run the ch02/pyspark_mongodb.py script, I get the following error:

I have also run the command "import pyspark" in python and python3 and get the same error. Trying to install pyspark using "pip install pyspark" results in a permissions error.

Anyone have an idea on how to get past this?

nmvega commented 7 years ago

I don't use EC2 (I have my own servers), so I can't speak to EC2-specific errors. But that said, what was the error? Can you paste it? Was it a network access issue (like you can't get to the outside world from your EC2 instance) or filesystem access issue or etc?

Just as a hunch, try: user$ pip install --user pyspark

tsyork commented 7 years ago

Thanks for your suggestion. I figured out that the problem was that I was trying to run the commands within python and not pyspark. Unfortunately, the book is not very clear in some of its instructions on how certain commands are to be run.