New method for loading historical data + comments on latest submitted code

A new way to load data into the database as been implemented. It allows finer grained control over loading and deleting data in the database.

The old way is still there (using /a/load), but hopefully this will be removed in the future.

Loading the database

The new method uses a python script that makes HTTP API calls to load the database. This requires the following python module

https://pypi.python.org/pypi/xlrd

The following is how to load the database in order for the tests to pass.

go to directory /scripts/fbpool
python --port 10090 --load year --year 2013
python --port 10090 --load year --year 2012

More details can be found here:
https://github.com/jbholden/cdcpool_google/blob/master/scripts/fbpool/README.md

Notes About the Implementation

directory /scripts/api A new HTTP API was created to allow reading, modifying, and deleting data.

directory /scripts/data This directory contains the historical excel files.

directory /scripts/excel A new python script was created for extracting the data from the excel files.

directory /scripts/fbpool This directory contains code to load, delete, and list data.

how load works

the fbpool.py script in /scripts/fbpool is executed on the command line
the script reads data from the excel file using /scripts/excel
the script then uses the API in /scripts/api to create the data in the database
Advantages of new code

The new code offers the following advantages

The old way was a hack that created python files with the data to load
Instead of having to load all the data, you can just load 1 week, or 1 year, etc.
You can delete one week and reload that week if you mess up instead of having to delete the entire database and start over
New data can be added easily by adding the excel file to the directory and then use the fbpool.py script to load that data
Future code could make use of the API if required
Some Additional Notes on Recently Added Code

directory tests/api

This directory contains code to test the new API. It is run from the command line using python. This code can be referenced for how to use the API.

directory pages/api

This directory contains the code to handle the HTTP API calls.

code/api.py

This file contains the code that implements the API. The code in pages/api uses this code to actually perform the API call

main.py

The code that maps URLs to the handlers has been modified to use a regular expression instead of the old way. This was done in order to capture string arguments for some of the API calls.

Future Improvements

Right now, the API calls allow anyone to modify, edit, or delete from the database.
Might want to implement some security in the future to only allow authorized users to modify and delete data

jbholden / cdcpool_google