outbreak-info / biothings_covid19

Biothings plugin for JHU CSSE COVID-19 cases
6 stars 5 forks source link

API: Cache/Precompute lat/lon for locations in /covid19 Epi endpoint #12

Open flaneuse opened 4 years ago

flaneuse commented 4 years ago

Currently, it takes ~ 100 minutes for the preprocessing script to run each day to update the epidemiology data in the /covid19 endpoint. Investigate possibilities for decreasing the run time, to more efficiently update the data each day closer to when JHU/NYT release their daily update.

One easy source of improvement is in the step is to get the largest polygon for country, state, county and metro and compute its centroid. In the beginning, JHU was adding lots of new places, but now, the majority are constant. Could precompute these places and only recompute when necessary.

The other time suck is less obvious on how/if it can be improved: generating items/stats for every date takes a while and that keeps pushing the run time up, especially since each day adds a new data point for every location.