sfbrigade / datasci-sba

Solving problems with the Small Business Administration
10 stars 18 forks source link

Batching and automating external API calls #59

Closed makfan64 closed 7 years ago

makfan64 commented 7 years ago

1. Brief Summary of what this PR accomplishes (140 characters or less. If you find trouble describing what you are doing in this length, consider breaking the PR into multiple ones.)

This is a spec for the API flow change to support batches and use cron.

2. Link to Trello Ticket

https://trello.com/c/6p6knshX

3. More detailed description and other questions to address in code review

This is just a spec to agree on the details before starting to code. Once the work is done, I'll update it with instructions for setting up the cron.

4. Remember to tag reviewers! @VincentLa

makfan64 commented 7 years ago

These are great comments/questions. Let's break out on Wednesday to go over them. I'll be missing 2-3 Wednesdays after this week due to schedule conflicts, so if we can make some decisions, I can get the code written on my own time.

makfan64 commented 7 years ago

I changed the description of this PR and the Trello card.

makfan64 commented 7 years ago

After conversation with @avdonovan I will make several changes to this proposal and push the updated doc ASAP, then I'll start coding the script and pipeline pieces.

makfan64 commented 7 years ago

@avdonovan @VincentLa Made several updates based on tonight's talk. Please review. Thanks!

makfan64 commented 7 years ago

@gregboyer Any comments on this approach?

VincentLa14 commented 7 years ago

Just a quick note on priority if it matters, it turns out that the original SBA data did have a congressional_district field, see: https://modeanalytics.com/editor/code_for_san_francisco/reports/b1bf764accba as an example.

The main caveat there which is why we still want to use Google Civic API is that presumably that's the congressional district at the time the loan was made, which could be different than the congressional district now. I would imagine that the data is mostly OK though so we could probably use that column for now if we absolutely needed it.