jschulberg / DC-Transportation-Crashes

Analysis of transportation-related crashes (car, motorcycle, pedestrian, bike) in the Washington, D.C. area.
0 stars 0 forks source link

Read in Data Using API #3

Closed jschulberg closed 1 year ago

jschulberg commented 1 year ago

@adampeir I tried to read in the D.C. crash data using the API. Check out the code.

jschulberg commented 1 year ago

It looks like we're limited to pulling 1000 rows. If we specify returnIdsOnly=true, we may be able to get around that, but then we probably lose a lot of meaningful features.

Another option is to set WHERE clauses to limit the query based on a specific date and then use a for loop in Python to run the query day-by-day, concatenating results into a master dataframe.

jschulberg commented 1 year ago

@mebmiranda I updated the ReadData.py script to break up the API data into smaller chunks and concatenate everything into one master DataFrame. This should do the trick (although it takes ~15-20 minutes to run).

jschulberg commented 1 year ago

The code seems to be working, but it's running super slow. I tried adding time.sleep(1) in to the code, in case it's the API acting up, so I'll see if that makes it any better.

jschulberg commented 1 year ago

This works now!