Closed zlatankr closed 7 years ago
Ok, made some "seemingly" big changes, but the core of it is still the same. Here's what I did:
api_calls
directory, are now modules that serve as helper functions. Each of these files will contain the relevant functions for accessing the respective APIs. For example, yelp_ratings.py
hits the Yelp API to get Yelp ratings, congressional_districts.py
hits the Google Civic API to get Congressional Districts.00_01_03_sba_sfdo_api_calls.py
and then the results are new fields that get written to stg_analytics.sba_sfdo_api_calls
sba_sfdo_id
which we can then use to join back to original sba_sfdo
data.stg_analytics.sba_sfdo_all
.I tested against local database truncated to just 100 rows. Looks like it works/runs successfully, but the success rate of finding Yelp reviews seems pretty low on the first 100 rows (maybe hit 5 businesses/yelp reviews?)
Running on production data now; we'll see how many hits we get. As you mention in the code, I think a big part of it is due to bad address normalization.
1. Brief Summary of what this PR accomplishes (140 characters or less. If you find trouble describing what you are doing in this length, consider breaking the PR into multiple ones.)
Cleaned up Yelp script that scrapes Yelp data and pushes it into a new table in our PostgreSQL database.
2. Link to Trello Ticket
https://trello.com/c/1JiYtVmg
3. More detailed description and other questions to address in code review
I ran the code outside of the function, so we need to make sure that this script can successfully run in the pipeline. Additionally, I combine the yelp data with the sfdo data and push into a new table, but maybe there's a better way to do it?
Need to add yelp credentials (see slack chat) to environment variables....
4. Remember to tag reviewers! @VincentLa