sfbrigade / datasci-sba

Solving problems with the Small Business Administration
10 stars 18 forks source link

Major Changes for External APIs to Support Batching #72

Open makfan64 opened 6 years ago

makfan64 commented 6 years ago

1. Brief Summary of what this PR accomplishes (140 characters or less. If you find trouble describing what you are doing in this length, consider breaking the PR into multiple ones.)

Major API changes to support external APIs running as a batch process. New control script, new subscripts to provide functionality that the control script can use. Yelp is fairly far along, the other APIs are just placeholders.

2. Link to Trello Ticket

https://trello.com/c/6p6knshX

3. More detailed description and other questions to address in code review

What do you think of this approach? Mostly uncertain of my SQL to get the dataset and also the way I am writing back the updated records. I'm a little fuzzy on the Pandas SQL interaction for write.

4. Remember to tag reviewers! @VincentLa @avdonovan

makfan64 commented 6 years ago

Oops, I need to update the test as I changed some things in the DB schema.

makfan64 commented 6 years ago

This has all of the Yelp flow working and fixed some performance issues I found with the way I updated the sba_sfdo_api_calls table.

makfan64 commented 6 years ago

Civics is working with this commit.

makfan64 commented 6 years ago

One thing I can't seem to fix is that updates to the table are changing the timestamp column datatypes to type text. It doesn't seem to break the flow, but I'd like to keep the data type.

makfan64 commented 6 years ago

Ok, this is ready for review, comments and testing. Whew! @VincentLa @avdonovan

makfan64 commented 6 years ago

Added explanation of the third party API flow to onboarding.

makfan64 commented 6 years ago

I noticed in a casual check today that the Yelp and Geocode timestamps have become blobs of text rather than dates. I don't know why just yet.

makfan64 commented 6 years ago

From my list, this commit addresses two tasks:

  1. Fix the APIs so if there is a timeout it tries again rather than giving up.
  2. For the Google geocode, try to put a short delay between each call.
makfan64 commented 6 years ago

I also corrected the table and changed my personal cron job to minimize the problems that caused the dates to between text blobs.