match4everyone / match4everything

Other
7 stars 0 forks source link

[Discussion] Evaluate switching to GeoDjango for distance calculations #62

Open maltezacharias opened 4 years ago

maltezacharias commented 4 years ago

GeoDjango is an API to include querys like distance and many other GIS functions in Django https://docs.djangoproject.com/en/3.0/ref/contrib/gis/db-api/#geodjango-database-api

It can make use of PostGIS (a postgres extension that allows spatial calculations in Postgres querys) but also has backends for MySQL, SQLlite and Oracle. Currently we use our own haversine implementation which IIRC is not the best solution in edge cases. I prefer using librarys if they're mature and cover our needs. This would require saving lat and lon information of our participant's location additionally to the ZIP but I think especially for larger databases this would be a good idea anyhow.

Was this ever discussed for helping health? What are your opinions on it? I think it would also allow us to perform distance calculations using cube method (from postgres docs:) but I have not verified that

Data is stored in cubes that are points (both corners are the same) using 3 coordinates representing the x, y, and z distance from the center of the Earth. A domain earth over cube is provided, which includes constraint checks that the value meets these restrictions and is reasonably close to the actual surface of the Earth. The radius of the Earth is obtained from the earth() function. It is given in meters. But by changing this one function you can change the module to use some other units, or to use a different value of the radius that you feel is more appropriate. This package has applications to astronomical databases as well. Astronomers will probably want to change earth() to return a radius of 180/pi() so that distances are in degrees. Functions are provided to support input in latitude and longitude (in degrees), to support output of latitude and longitude, to calculate the great circle distance between two points and to easily specify a bounding box usable for index searches.

maltezacharias commented 4 years ago

Did a little more reading, a lighter version would be https://pypi.org/project/django-earthdistance/#description which would also support in database distance calculation but would only work with postgres. OTOH it would be far simpler to implement and doesn't require the PostGIS extension on the DB side.

Relevant postgres docs: https://www.postgresql.org/docs/8.3/earthdistance.html

josauder commented 4 years ago

Currently we use our own haversine implementation which IIRC is not the best solution in edge cases.

I think this function is pretty robust, i.e. computing the distance on a (slightly-non-round) sphere is not so difficult. I think including another dependency just to not use an own implementation of this would be overkill. A sensible argument FOR a library would be that the database could use some tree-like structure, i.e. right now we compare each object we are querying with all other objects using our haversine implementation (this is O(N)), whereas a spatial tree structure (should be included with any GIS-database extension) will reduce this to O(logN)

Baschdl commented 4 years ago

A geospatial index is a good idea, especially for big projects. I have some experience with PostGIS but our main problem with geocoding, getting the coordinates, would still remain.

maltezacharias commented 4 years ago

I'd just save the zipcode's Lat/Lon as Points to the Database per user, that would work in any case, no matter how we determine the location and could also be upgraded to use something else if one wanted to do that. Maybe add a precision column for the map indicators, but I don't expect that we will have the time to do map improvements in the remaining time