pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
28 stars 43 forks source link

Wrong geopoint being return for queries. #202

Closed Hoovs closed 7 years ago

Hoovs commented 7 years ago

SETUP: ES version 2.4 Pelias branches: master

ISSUE: I have imported data from whosonfirst-data-venue-us-ca and built using admin-only. The data imported 3.5M records into ES and is about 100M in size. However, when I query against it the geo point is always wrong. In Mapzen the geopoint is correct. ex)

{
  "geocoding": {
    "version": "0.2",
    "attribution": "http://pelias.mapzen.com/v1/attribution",
    "query": {
      "text": "10393 Tennessee Ave, Los Angeles, CA 90064",
      "parsed_text": {
        "name": "10393 Tennessee Ave",
        "number": "10393",
        "street": "Tennessee Ave",
        "state": "CA",
        "postalcode": "90064",
        "regions": [
          "Los Angeles"
        ],
        "admin_parts": "Los Angeles, CA 90064"
      },
      "size": 10,
      "private": false,
      "querySize": 20
    },
    "engine": {
      "name": "Pelias",
      "author": "Mapzen",
      "version": "1.0"
    },
    "timestamp": 1488222570442
  },
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [
          105.062867,
          9.066148
        ]
      },
      "properties": {
        "id": "85680839",
        "gid": "whosonfirst:region:85680839",
        "layer": "region",
        "source": "whosonfirst",
        "source_id": "85680839",
        "name": "Cà Mau",
        "confidence": 0.3,
        "match_type": "fallback",
        "accuracy": "centroid",
        "country": "Vietnam",
        "country_gid": "whosonfirst:country:85632763",
        "country_a": "VNM",
        "region": "Cà Mau",
        "region_gid": "whosonfirst:region:85680839",
        "label": "Cà Mau, Vietnam"
      },
      "bbox": [
        104.720225457,
        8.56557851793,
        105.425431436,
        9.53443388102
      ]
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [
          -119.586168,
          36.531544
        ]
      }, etc...

I have looked at using OSM and may if I cannot resolve this but I thought it might be something easy I am missing. Thank you.

orangejulius commented 7 years ago

Hey @Hoovs, Thanks for reporting this issue. I think there are two things contributing to the query not returning what you'd like.

The first is that it doesn't look like you're using libpostal to improve the input parsing. Technically our install docs list it as optional, but it's much much better to use it. You'll have to install it, and then run rm -rf node_modules; npm install from within the pelias/api directory on your machine (as well as restart the API process itself).

The second is that since you're searching for an address, you'll need to import a dataset with addresses. I don't believe WOF has any addresses. If you look at your query against Mapzen Search (our hosted instances of Pelias), you'll see the result returned comes from OpenAddresses, a huge dataset of addresses. Take a look at the pelias/openaddresses importer, you can use it to import addresses for just Los Angeles, all of California, or all of the USA easily if you'd like.

Let us know if you have more questions, either here, or in our gitter chat channel if you'd like!

Hoovs commented 7 years ago

@orangejulius thank you for the fast response. I was trying to avoid using openaddresses in favor of OSM but i followed your suggestions and it did work just fine. I will try and figure out why OSM is having an issue but this worked. Thank you for your help with this!!!!!

orangejulius commented 7 years ago

No problem. OSM does have lots of addresses as well, so it's definitely a decent source to use. In major US cities, OA probably has better coverage, but if you know the data you want is in OSM, feel free to use it instead.