Open nguyenuy opened 5 years ago
I think distance to relevant health facilities can be calculated using this library: https://pypi.org/project/geopy/
However, the computation from a given address to all other addresses is probably a little slow. To address this issue I think we should use the follow approach: 1) Precompute all of the long, lat of the facilities (maybe these are already in the data?) using something like geopy 2) Put these into a KD-Tree locally using scipy.spatial.KDTree
Then to compute the nearest neighbors for a given address you can do the following: 1) For each new address entered, compute the x, y location using geopy 2) Use the KDTree to get the top n closest neighbors, or get neighbors within some cutoff distance
Let me know your thoughts, and if this solves the issue.
Given the time constraints for the week, I think we will have to bypass obtaining the exact facility distance. This will skew our numbers. The geopy library has to retrieve the coordinates of an address using a network call which is a bottleneck in itself.
This may require addition of datasets depending on how we want to weigh a given area. Let's start with something simple from here and build from there.
Healthscore access critieria