hershaw / data-science-101

Do some data analysis and build a predictive model
MIT License
6 stars 6 forks source link

How to use this repo

Each directory contains a README.md that should be read before exploring any sub-directories. You can just browse the directories on github.com in your favorite browser for instructions on how to use the directory's contents.

Since this README is at the top-level, it contains installation and usage instructions.

Prerequisites

Setup

Optional but strongly suggested

Sign up for a GitHub account if you don't already have one. Really, it's quite useful.

Note: if looking for a docker smaller footprint jump to the end.

Step 1 - Install dependencies

Follow each of the links to download and install.

Step 2 - Clone this repo

If you installed git using GitHub Desktop, follow these instructions

If you are cloning from the command line:

Take a note of the filepath on which you cloned the repo!

Step 3 - Start your environment

Navigate on the command line to the root of the repo

Assuming you've cloned the repo onto your desktop on OS X, the command would look something like

Start the virtual machine

Note that this will take a LONG time and you should have a good internet connection in order to expedite the process.

If you're uncomfortable on the command line, please do your best to power through this and share your learnings and ask for help on the class' glitter channel.

Step 4

Make sure that jupyter notebook is running

Open http://localhost:8888 in your browser and you should see the course directory.

Using jupyter notebook, enter the course directory and run test.ipynb to make sure that everything was installed okay. If you can run this without errors, you are good to go!

If for some reason this doesn't work, head over to the glitter channel.

Final notes

More useful commands to execute from the data-science-101 root directory

Alternative setup using docker

Step 1

Install:

Step 2

If you installed git using GitHub Desktop, follow these instructions

If you are cloning from the command line:

1. $ git clone https://github.com/hershaw/data-science-101.git

Step 3

  1. Build a docker image (first time takes longer):
    $ docker build -t data-science -f Dockerfile .
  2. Run a container: (Assuming your code is located on "~/Desktop/data-science-101". If needed replace it with something meaningful.)
    $ docker run -it --rm -p 127.0.0.1:8888:8888 --volume ~/Desktop/data-science-101:/home/vagrant/ \
    --workdir /home/vagrant/ -e PYTHONPATH=.:/home/vagrant/course data-science

Step 4

Open http://localhost:8888 to access notebook. To stop everything just hit +c and then "y" on the interactive shell.

Final notes

This is not the recommended security approach, please use tokens or authentication. Expect much less hardware requirements.