pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
111 stars 73 forks source link

Different search results #264

Closed oliverbienert closed 4 years ago

oliverbienert commented 7 years ago

Hello, we've setup our own pelias server and recently there were a question why the online mapzen search delivers a different result for certain search terms. Our setup is as follows:

pelias.json:

{
  "imports": {
    "adminLookup": {
      "enabled": true
    },
    "openstreetmap": {
      "datapath": "/data/osm",
      "leveldbpath": "/tmp",
      "import": [{
        "filename": "berlin_germany.osm.pbf"
      }]
    },
    "polyline": {
      "adminLookup": true,
      "datapath": "/data/polylines",
      "files": [ "berlin" ]
    },
    "whosonfirst": {
      "datapath": "/data/whosonfirst"
    }
  },
  "api": {
    "textAnalyzer": "libpostal"
  }
}

The query (with httpie) looks like:

http "http://localhost:3100/v1/autocomplete?boundary.rect.min_lon=12.568359375&boundary.rect.max_lon=14.1064453125&boundary.rect.min_lat=52.1402312011&boundary.rect.max_lat=52.9287745258&sources=osm&text=Nollendorfplatz"

I've attached the two result files from mapzen search and our own. Mapzen autocomplete search returns 7 records (All with soure=openstreetmap), our own server 2 records. My question is, why is there a difference? How can I dig into this? pelias_results.zip

oliverbienert commented 7 years ago

I now loaded ALL whosonfirst data (without the --adminOnly switch) and reimported openstreetmap and polyline data with adminLookup enabled into elasticsearch. My result is now almost identical with that from the mapzen online seach, with one exception:

    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [
          13.352223,
          52.49935
        ]
      },
      "properties": {
        "id": "node:3826455882",
        "gid": "openstreetmap:venue:node:3826455882",
        "layer": "venue",
        "source": "openstreetmap",
        "source_id": "node:3826455882",
        "name": "Quartier Apotheke Nollendorfplatz",
        "housenumber": "3-4",
        "street": "Nollendorfplatz",
        "postalcode": "10777",
        "accuracy": "point",
        "country": "Germany",
        "country_gid": "whosonfirst:country:85633111",
        "country_a": "DEU",
        "region": "Berlin",
        "region_gid": "whosonfirst:region:85682499",
        "county": "Berlin",
        "county_gid": "whosonfirst:county:102063945",
        "locality": "Berlin",
        "locality_gid": "whosonfirst:locality:101748799",
        "borough": "Tempelhof-Schoneberg",
        "borough_gid": "whosonfirst:borough:1108815557",
        "neighbourhood": "Schoneberg",
        "neighbourhood_gid": "whosonfirst:neighbourhood:420784327",
        "label": "Quartier Apotheke Nollendorfplatz, Berlin, Germany"
      }
    }

This is not found by my local server, but on Mapzen. I wonder why there is this difference? My Openstreetmap data are from mapzen extract.

oliverbienert commented 7 years ago

Seems to boil down to what openstreetmap data are used, sort of. When I use Geofabrik data instead of a Metro extract, I got a different result with the same query. Which I don't fully understand. With the Mapzen metro extract, the query did not find "Quartier Apotheke Nollendorfplatz, Berlin, Germany". With the osm data from Geofabrik, it does.

missinglink commented 7 years ago

hi @oliverbienert, firstly, I'm sorry we didn't get back to you sooner.

I just downloaded the geofabrik and mapzen extracts you mentioned, I suspected that there was a problem with our pbf extract code, on closer inspection they both seem to contain an entry for that pharmacy (node:3826455882):

# mapzen extract
$ pbf json berlin_germany.osm.pbf | grep -i 'Quartier Apotheke'

{"id":2365398263,"type":"node","lat":52.498264,"lon":13.349239,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"20","addr:postcode":"10777","addr:street":"Motzstraße","addr:suburb":"Schöneberg","amenity":"pharmacy","dispensing":"yes","name":"Quartier Apotheke Motzstraße","opening_hours":"Mo-Fr 08:30-20:00; Sa 09:00-16:00","phone":"+49 30 21479390","website":"http://www.quartier-apotheke.de","wheelchair":"yes"}}
{"id":3826455882,"type":"node","lat":52.499348,"lon":13.352222,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"3-4","addr:postcode":"10777","addr:street":"Nollendorfplatz","addr:suburb":"Schöneberg","amenity":"pharmacy","name":"Quartier Apotheke Nollendorfplatz","opening_hours":"Mo-Fr 08:30-20:00; Sa 09:00-17:00","wheelchair":"yes"}}
{"id":3859397154,"type":"node","lat":52.4986,"lon":13.354431,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"3","addr:postcode":"10777","addr:street":"Maaßenstraße","addr:suburb":"Schöneberg","amenity":"pharmacy","dispensing":"yes","email":"maassenstrasse@quartier-apotheke.de","fax":"2172904","name":"Quartier Apotheke Maaßenstraße","old_name":"Apotheke am Nollendorfplatz","opening_hours":"Mo-Fr 08:30-20:00; Sa 09:00-20:00","operator":"Quartier Apotheke Nollendorfplatz e.K., Kai Uwe Wilken-Prozesky","phone":"+49 30 2163453","website":"http://www.quartier-apotheke-nollendorfplatz.de","wheelchair":"yes"}}
{"id":3472167668,"type":"node","lat":52.49395,"lon":13.353622,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"35","addr:postcode":"10781","addr:street":"Goltzstraße","addr:suburb":"Schöneberg","amenity":"pharmacy","dispensing":"yes","email":"goltzstrasse@quartier-apotheke.de","fax":"030275757599","name":"Quartier Apotheke Goltzstraße","old_name":"Apotheke am Winterfeldplatz","opening_hours":"Mo-Fr 09:00-20:00; Sa 09:00-17:00","operator":"Quartier Apotheke Goltzstraße Kai-Uwe Wilken-Prozesky e. K.","phone":"+49 30 275757590","website":"http://www.quartier-apotheke-goltzstrasse.de","wheelchair":"yes"}}
# geofabrik extract
$ pbf json brandenburg-latest.osm.pbf | grep -i 'Quartier Apotheke'

{"id":2365398263,"type":"node","lat":52.498264,"lon":13.349239,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"20","addr:postcode":"10777","addr:street":"Motzstraße","addr:suburb":"Schöneberg","amenity":"pharmacy","dispensing":"yes","name":"Quartier Apotheke Motzstraße","opening_hours":"Mo-Fr 08:30-20:00; Sa 09:00-16:00","phone":"+49 30 21479390","website":"http://www.quartier-apotheke.de","wheelchair":"yes"}}
{"id":3472167668,"type":"node","lat":52.49395,"lon":13.353622,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"35","addr:postcode":"10781","addr:street":"Goltzstraße","addr:suburb":"Schöneberg","amenity":"pharmacy","dispensing":"yes","email":"goltzstrasse@quartier-apotheke.de","fax":"030275757599","name":"Quartier Apotheke Goltzstraße","old_name":"Apotheke am Winterfeldplatz","opening_hours":"Mo-Fr 09:00-20:00; Sa 09:00-17:00","operator":"Quartier Apotheke Goltzstraße Kai-Uwe Wilken-Prozesky e. K.","phone":"+49 30 275757590","website":"http://www.quartier-apotheke-goltzstrasse.de","wheelchair":"yes"}}
{"id":3826455882,"type":"node","lat":52.499348,"lon":13.352222,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"3-4","addr:postcode":"10777","addr:street":"Nollendorfplatz","addr:suburb":"Schöneberg","amenity":"pharmacy","name":"Quartier Apotheke Nollendorfplatz","opening_hours":"Mo-Fr 08:30-20:00; Sa 09:00-17:00","wheelchair":"yes"}}
{"id":3859397154,"type":"node","lat":52.4986,"lon":13.354431,"tags":{"addr:city":"Berlin","addr:country":"DE","addr:housenumber":"3","addr:postcode":"10777","addr:street":"Maaßenstraße","addr:suburb":"Schöneberg","amenity":"pharmacy","dispensing":"yes","email":"maassenstrasse@quartier-apotheke.de","fax":"2172904","name":"Quartier Apotheke Maaßenstraße","old_name":"Apotheke am Nollendorfplatz","opening_hours":"Mo-Fr 08:30-20:00; Sa 09:00-20:00","operator":"Quartier Apotheke Nollendorfplatz e.K., Kai Uwe Wilken-Prozesky","phone":"+49 30 2163453","website":"http://www.quartier-apotheke-nollendorfplatz.de","wheelchair":"yes"}}

It's possible that the data was not up-to-date when you imported it and now 1 month later it's fixed... I'm really not sure.

regarding the hosted mapzen search service at search.mapzen.com, we use the same tools to build that service but we use the full planet.osm.pbf file (~35GB) when we build, so we're not actually running that off PBF extracts.

It's hard to say what is going on here, it's possible that your import is not sucessfully being completed and so, therefore, some data is missing, or.. some settings can also prevent certain data from being imported (ie. the OSM feature whitelist) or the time after indexing before the data becomes available in search (ie. the elasticsearch refresh_interval). I'm assuming you haven't made any modifications to those settings?

We have fairly extensive test cases in place to ensure that openstreetmap imports are repeatible and don't drop data, so I would be surprised to find an import bug in that code which dropped random rows.

I'd suggest trying to import again off those extracts I mentioned above, I can confirm that the pharmacy is contained in them, I you're still not able to find it then please add a comment and we can dig in to it further.

# downloaded 16-Mai-07

$ shasum *.pbf
66eb01f8023af1ff5305282bf302d136a6b99715  berlin_germany.osm.pbf
a8d6ae3ab7e754a96bfc9e941eaf3820d7552234  brandenburg-latest.osm.pbf
oliverbienert commented 7 years ago

Thank you for looking into this. I am going to test this again with recent downloads. It is quite possible that we made a mistake in setting up the importing process properly. As a first step I took a mapzen download from 12/2016 and can confirm that the data in question is there.