mids-w207-final-project
Primary Files:
- exploratory_data_analysis.ipynb: Jupyter notebook with a detailed analysis of the training data
- feature_engineering.py: Python library containing all transformations
- models.py: Python library containing all models and configurations
- clear_cut_solution.ipynb: Jupyter notebook with descriptions, solutions and test results
Repo Map
README.md
- Project introduction, file structure, environment instructions
exploratory_data_analysis.ipynb
- Distributions, visualizations, sanity checks, correlation etc.
clear_cut_solution.ipynb
- Formal project implementation with feature engineering, training, evaluation and testing
feature_engineering.py
and models.py
- Libraries of functions for feature engineering and models Consumed in clear_cut_solution.py and also contains experimental code not included in final project.
./data
- Notebook diagrams, training data, testing data
./submissions
- Test output files (csv) to be uploaded on Kaggle
./backups
- Html, markdown, and python versions of the clear_cut_solution notebook
./comp_setup
- Details of custom container creation
Computing Environment
Work was conducted in the kmartcontainers/207final
container (Dockerhub link). It is a custom container which adds the xgboost
library to the jupyter/tensorflow-notebook
docker container as put together by the jupyter development team. Details of how to set up the container to run on your machine or GCP as well as details of the container creation are in the comp_setup/ComputeSetup.md
file.