konektaz / where-is-when-is

Geodjango bootstrap integration
www.konektaz.info
8 stars 4 forks source link

import locations from openstreetmap #128

Closed konektaz closed 10 years ago

konektaz commented 10 years ago

Currently we have location in the database that most show the lat/long and the hospital name only. There are 'none' placeholders where no data has been imported for the location address. This is because the initial import did not query/import that data. This issue looks to fill the location data fields http://www.konekta.info/location/add/ and clean out any duplicates.

Here is the method suggested:

Harvesting location data

First you go to: http://overpass-api.de/query_form.html There are 2 boxes (textarea) where you need to enter your XML query.

Example query looks like this one: https://github.com/konekta/where-is-when-is/blob/master/world/fixtures/south_africa_query.xml

put it to second box and select "to OpenLayers auto-centered overlay" and click convert. It'll show you the map of locations.

The example query searches for hospital or clinic in defined coordinates.

This is done automatically by this script: https://github.com/konekta/where-is-when-is/blob/master/world/management/commands/osm_import.py

Reverse geocoding

Now the second bit is to reverse goecode the retrieved coordinates against OSM. It's much easier as you don't need to create XML query but just create standard URL: http://wiki.openstreetmap.org/wiki/Nominatim#Reverse_Geocoding_.2F_Address_lookup

Basically new script needs to loop through all the locations in database that haven't address data (but they have coordinates) and fetch the data from URL: http://nominatim.openstreetmap.org/reverse?format=json&lat=52.5487429714954&lon=-1.81602098644987&zoom=18&addressdetails=1

and save back to database.

Tasks

grvhi commented 10 years ago

The xml returned from the first box is actually more useful for a function (as it returns raw xml data)

konektaz commented 10 years ago

there are duplicate locations in the DB. When we do the lat/long part of the task which I believe links the location name the the lat/long we need to make sutre there are no duplicates. http://www.konekta.info/south-africa/ http://www.konekta.info/location/crescent-clinic/

grvhi commented 10 years ago

@konekta @timlinux

My plan with finding duplicates is this:

Create a function which performs the following tasks:

  1. Find all objects which have no external_id.
    • If there are no address details, delete it.
    • If there are address details, check if there is a duplicate entry with an external_id and merge / delete
  2. Find objects with exactly the same name
    • Merge them or delete the one with no details.
  3. Perhaps find matching address details (where both have no external_id)?

Feel free to change my plans!

grvhi commented 10 years ago

De-Duplication

In the interest of time-saving, the problem of de-duplication will be solved by deleting all existing Location data and re-importing all locations from OSM. At a later date, we will need to implement a system to prevent users from manually adding already existing locations - #134

Other Tasks

CC

@konekta @timlinux