Closed Joshdpaul closed 4 months ago
Thanks for the detailed review here, Craig.
Good catch on the duplicated comments! I fixed that bug via 1ebd680 and a93ec66. The rows affected were places having simple one-to-one relationships with tract geography. These were missed by the conditional statements and so inherited the comment from the previous iteration (previous row). Should be good to go now.
I also re-exported the environment.yml
as per your suggestion - please give that a shot and see what happens.
This PR is the first draft of a production-ready routine to fetch and format data for the EPA-Justice-HIA project.
The majority of this code is parsing entries in the
NCRPlaces_Census_{MMDDYYYY}.csv
table into requests to the various APIs. There is a quite a bit of string chopping and table wrangling to accomplish the fetching and merging of all this data into a single output. I realize that this kind of code can be difficult to review without seeing interim outputs, so to expose some of those interim pieces, I called a few of the individual data fetching functions infetch_data_and_export.ipynb
and included some URL request printing to allow you to see the API requests, the JSON returned, and compare them with the reformatted function outputs.TO TEST:
README.md
fetch_data_and_export.ipynb
, paying particular attention to the testing cells in the first part of the notebook. Investigate some of the URLs, and make sure the values you see in the printed dataframes match those returned by the API.run_fetch_and_merge()
function to see how it orchestrates the individual data fetching steps. This function uses multiprocessing and produces a single table output (data_to_export.csv
).data_to_export.csv
. The result should be identical to the version committed in this repo. Is this a reasonable table format for incorporating into NCR?functions.py
. Unless something breaks, there is no need for an in-depth review here ... but see if you can follow the processing logic backwards fromrun_fetch_and_merge()