rjurney / Agile_Data_Code_2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
http://bit.ly/agile_data_science
MIT License
456 stars 306 forks source link

Vagrant : bootstrap.sh too old. #70

Closed apacifico closed 5 years ago

apacifico commented 6 years ago

I am using Vagrant with windows to setup the environment discribed in the book (very good). I had some issue with bootstrap.sh script file used to setup the machine.

1) miniconda uses the last version of python and the only way to obtain a version of python compatible with spark was to force te version

#

Install Miniconda

# echo "curl -sLko /tmp/Miniconda3-latest-Linux-x86_64.sh https://repo.continuum.io/miniconda/Miniconda-3.5.2-Linux-x86_64.sh" curl -sLko /tmp/Miniconda3-latest-Linux-x86_64.sh https://repo.continuum.io/miniconda/Miniconda-3.5.2-Linux-x86_64.sh chmod +x /tmp/Miniconda3-latest-Linux-x86_64.sh /tmp/Miniconda3-latest-Linux-x86_64.sh -b -p /home/vagrant/anaconda

I specified in this line the version of python (but seems produce no effect) conda install python=3.5.2

I used this version of hadoop (2.8.2) is not available

#

Install Hadoop

# echo "curl -sLko /tmp/hadoop-2.8.3.tar.gz http://apache.osuosl.org/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz" curl -sLko /tmp/hadoop-2.8.3.tar.gz http://apache.osuosl.org/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz mkdir -p /home/vagrant/hadoop cd /home/vagrant/ tar -xvf /tmp/hadoop-2.8.3.tar.gz -C hadoop --strip-components=1

I used this version for zeppelin (the version referenced in the script is not available)

Install Apache Zeppelin

echo "curl -sLko /tmp/zeppelin-0.7.3-bin-all.tgz http://www-us.apache.org/dist/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz" curl -sLko /tmp/zeppelin-0.7.3-bin-all.tgz http://www-us.apache.org/dist/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz mkdir zeppelin tar -xvzf /tmp/zeppelin-0.7.3-bin-all.tgz -C zeppelin --strip-components=1

With these changes I was able to start pyspark.

I have also issue with the port 5000 , no way to access it from my browser on my windows machine With pyspark I don't find a way to execute python script like you show in page 52 on your book for elasticsearch (sc.textFile not found). The script indicates we have to use spark-submit but not working. So at this stage I can't avalidate all the stack you describe in chapter 2 of the book.

Thanks in advance for your help.

Angelo

rjurney commented 6 years ago

Vagrant is no longer supported, sorry. Minicoda 3.5 and Hadoop 3.0 are installed on the EC2 image. The book does not actually use Zeppelin, and I could never get it working, so I removed it.

apacifico commented 6 years ago

Ok, I have bought the book this week. So for new people discovering the book , vagrant is an option. My question is concerning vagrant and why today it is not viable to be used to allow the stack you describe to be used? Could you communicate transparently on all these problems you meet with vagrant?

rjurney commented 6 years ago

@apacifico I'm making an effort to resurrect Vagrant support.

rjurney commented 6 years ago

I think I resolved all the issues in https://github.com/rjurney/Agile_Data_Code_2/commit/28f215433058434b1bb330658254be41c6da217d

@apacifico Can you test things now?

apacifico commented 6 years ago

Thanks for your feedback Russel. I will compare with the update I do in my side.

Regards

Angelo Pacifico

2018-04-05 4:57 GMT+02:00 Russell Jurney notifications@github.com:

I think I resolved all the issues in 28f2154 https://github.com/rjurney/Agile_Data_Code_2/commit/28f215433058434b1bb330658254be41c6da217d

@apacifico https://github.com/apacifico Can you test things now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rjurney/Agile_Data_Code_2/issues/70#issuecomment-378806260, or mute the thread https://github.com/notifications/unsubscribe-auth/AAyTH2gKgtI3Dl98nPPmOhrLb9hurAFJks5tlYgjgaJpZM4SzHPh .

peopzen commented 6 years ago

@rjurney, Per "Once you've provided it your AWS credentials, run the following command to bring up a machine pre-configured with the book's complete environment and source code: ec2.sh", does it mean we need an activated AWS account to create EC2 image on local machine? or I can use an AWS trial account (30 days)? Thanks. Alright, seems I can get 12 months long AWS trail account. It could be enough.

rjurney commented 6 years ago

@peopzen You can use a trial account so long as you have the credentials.

rjurney commented 5 years ago

Trying the Vagrant image for the first time in a long time...

rjurney commented 5 years ago
vagrant up

After I run this and it loads, everything works for me at http://localhost:8888/notebooks/ch02/Agile_Tools.ipynb except Kafka. I will try and fix Kafka later today.