Closed jschulberg closed 1 year ago
It looks like we're limited to pulling 1000 rows. If we specify returnIdsOnly=true
, we may be able to get around that, but then we probably lose a lot of meaningful features.
Another option is to set WHERE
clauses to limit the query based on a specific date and then use a for loop
in Python to run the query day-by-day, concatenating results into a master dataframe.
@mebmiranda I updated the ReadData.py script to break up the API data into smaller chunks and concatenate everything into one master DataFrame. This should do the trick (although it takes ~15-20 minutes to run).
The code seems to be working, but it's running super slow. I tried adding time.sleep(1)
in to the code, in case it's the API acting up, so I'll see if that makes it any better.
This works now!
@adampeir I tried to read in the D.C. crash data using the API. Check out the code.