sfbrigade / datasci-sba

Solving problems with the Small Business Administration
10 stars 18 forks source link

adding in parse task for sandbox table #63

Closed VincentLa14 closed 7 years ago

VincentLa14 commented 7 years ago

1. Brief Summary of what this PR accomplishes (140 characters or less. If you find trouble describing what you are doing in this length, consider breaking the PR into multiple ones.)

Adding in geocoded loan dataset using Noah's "manually" cleaned data.

2. Link to Trello Ticket

https://trello.com/c/5hTzjejl/68-ingest-noahs-clean-csv-but-without-sfdo-specific-columns-at-least-get-lat-long-google-places-rating

3. More detailed description and other questions to address in code review

While we want to use national FOIA data set generally, since we're having some difficulty building out our geocoder module, just for the September 2017 MVP, we're going to take the dataset already geocoded by Noah and merge in the geocoded columns in. That way we can get going on our business level visualization.

In order to run the parser, you will need to add the dataset to /src/data/sba. Then run

python -m pipeline.pipeline_tasks.sandbox.parse_sfdo_504_7a_clean --db_url=$SBA_DWH

from the top-level directory. This will create the table sandbox.sfdo_504_7a_clean. This is in sandbox to denote that we don't want it to be part of our actual database moving forward, just for the MVP.

You will then need to do a simple join to grab the columns you need.

4. Remember to tag reviewers!