rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
456 stars 307 forks source link

pyspark example bug #23

Closed seajosh closed 7 years ago

seajosh commented 7 years ago

Ch02 "Collecting Data", p37

problem: Assuming user in is in ch02 directory per text

csv_lines = sc.textFile("data/example.csv") incorrect file path

solution: csv_lines = sc.textFile("../data/example.csv")

rjurney commented 7 years ago

Where does it say to be in ch02 directory? Or is this just assumed? Actually, all code has to be executed from $PROJECT_HOME. Any tips for how I could better express this?

seajosh commented 7 years ago

Below is what made me go to the ch02 dir

"To get an iPython shell with PySpark, make sure you’re in the ch02 directory and run PYSPARK_DRIVER_PYTHON=ipython pyspark ."

rjurney commented 7 years ago

Shit. Thanks!

rjurney commented 7 years ago

Updated the book, thanks.