pelias / api

HTTP API for Pelias Geocoder
http://pelias.io
MIT License
219 stars 163 forks source link

Neighbourhood not being used in search #1164

Closed timburgess closed 6 years ago

timburgess commented 6 years ago

I've runup pelias / elasticsearch 2.4 / nodejs 6.14 / libpostal and have loaded WOF and openaddresses. I've dealt with a few things like elasticsearch tweaks/memory, getting the libpostal 8080 server going, ensuring that lookupAdmin is configured, etc., but I now have every piece running without error. I am using production branch where it is present in a repo.

As you can see with the pelias.json I'm using I've loaded up Australia specifically. WOF and opensaddresses pull the requisite data and I see the _count in elasticsearch that I would expect.

However on doing a query like http://localhost:3100/v1/search?text=41+Thompson+St+Aitkenvale+QLD+Australia which is a fairly common street name, I get a response which is a FeatureSet that incorporates each street in the country that has that name all with a confidence of 1. I was expecting that I would get a result with the street name of the neighbourhood as the highest confidence. Looking at the response, libpostal is identifying the neighbourhood correctly but that seems to be disregarded in the elasticsearch query.

The initial part of the query response is:

{
    "geocoding": {
        "version": "0.2",
        "attribution": "http://108.61.96.7:3100/v1/attribution",
        "query": {
            "text": "41 Thompson St Aitkenvale QLD Australia",
            "size": 10,
            "private": false,
            "lang": {
                "name": "English",
                "iso6391": "en",
                "iso6393": "eng",
                "defaulted": true
            },
            "querySize": 20,
            "parser": "libpostal",
            "parsed_text": {
                "number": "41",
                "street": "thompson st",
                "neighbourhood": "aitkenvale",
                "state": "qld",
                "country": "australia"
            }
        },
        "engine": {
            "name": "Pelias",
            "author": "Mapzen",
            "version": "1.0"
        },
        "timestamp": 1529315944762
    },
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    151.4892,
                    -33.358235
                ]
            },
            "properties": {
                "id": "au/countrywide:e8166671f842a777",
                "gid": "openaddresses:address:au/countrywide:e8166671f842a777",
                "layer": "address",
                "source": "openaddresses",
                "source_id": "au/countrywide:e8166671f842a777",
                "name": "41 Thompson Street",
                "housenumber": "41",
                "street": "Thompson Street",
                "postalcode": "2261",
                "confidence": 1,
                "match_type": "exact",
                "accuracy": "point",
                "country": "Australia",
                "country_gid": "whosonfirst:country:85632793",
                "country_a": "AUS",
                "region": "New South Wales",
                "region_gid": "whosonfirst:region:85681545",
                "region_a": "NSW",
                "county": "Wyong (A)",
                "county_gid": "whosonfirst:county:102049097",
                "localadmin": "Long Jetty",
                "localadmin_gid": "whosonfirst:localadmin:404224583",
                "locality": "Long Jetty",
                "locality_gid": "whosonfirst:locality:101931081",
                "continent": "Oceania",
                "continent_gid": "whosonfirst:continent:102191583",
                "label": "41 Thompson Street, Long Jetty, NSW, Australia"
            }
        },
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    144.904723,
                    -37.865198
                ]
            },
            "properties": {
                "id": "au/countrywide:358ffdc9a2b08106",
                "gid": "openaddresses:address:au/countrywide:358ffdc9a2b08106",
                "layer": "address",
                "source": "openaddresses",
                "source_id": "au/countrywide:358ffdc9a2b08106",
                "name": "41 Thompson Street",
                "housenumber": "41",
                "street": "Thompson Street",
                "postalcode": "3016",
                "confidence": 1,
                "match_type": "exact",
                "accuracy": "point",
                "country": "Australia",
                "country_gid": "whosonfirst:country:85632793",
                "country_a": "AUS",
                "region": "Victoria",
                "region_gid": "whosonfirst:region:85681497",
                "region_a": "VIC",
                "county": "Hobsons Bay (C)",
                "county_gid": "whosonfirst:county:102049107",
                "localadmin": "Williamstown (Vic.)",
                "localadmin_gid": "whosonfirst:localadmin:404540783",
                "locality": "Williamstown",
                "locality_gid": "whosonfirst:locality:101933735",
                "neighbourhood": "Williamstown",
                "neighbourhood_gid": "whosonfirst:neighbourhood:85771427",
                "continent": "Oceania",
                "continent_gid": "whosonfirst:continent:102191583",
                "label": "41 Thompson Street, Williamstown, VIC, Australia"
            }
        }, ...

The api startup is:

> pelias-api@0.0.0-development start /home/tim/pelias/api
> ./bin/start

2018-06-18T09:55:35.891Z - warn: [pip] pip service disabled
2018-06-18T09:55:35.893Z - warn: [placeholder] placeholder service disabled
2018-06-18T09:55:35.893Z - warn: [language] language service disabled
2018-06-18T09:55:35.893Z - warn: [interpolation] interpolation service disabled
2018-06-18T09:55:35.894Z - info: [libpostal] using libpostal service at http://localhost:8080/
2018-06-18T09:55:35.894Z - info: [libpostal] using libpostal service at http://localhost:8080/
pelias is now running on :::3100

and my pelias.json is:

{
  "esclient": {
    "apiVersion": "2.4",
    "keepAlive": true,
    "requestTimeout": "120000",
    "hosts": [{
      "env": "development",
      "protocol": "http",
      "host": "localhost",
      "port": 9200
    }],
    "log": [{
      "type": "stdio",
      "level": [ "error", "warning" ]
    }]
  },
  "elasticsearch": {
    "settings": {
      "index": {
        "number_of_replicas": "0",
        "number_of_shards": "5",
        "refresh_interval": "1m"
      }
    }
  },
  "interpolation": {
    "client": {
      "adapter": "http",
      "host": "http://localhost:9999"
    }
  },
  "api": {
    "accessLog": "common",
    "textAnalyzer": "libpostal",
    "host": "https://kachi.io/",
    "indexName": "pelias",
    "version": "1.0",
    "services": {
      "libpostal": {
        "url": "http://localhost:8080"
      }
    },
    "targets": {
      "auto_discover": false,
      "layers_by_source": {
        "openaddresses": [ "address" ],
        "whosonfirst": [
          "continent", "empire", "country", "dependency", "macroregion", "region", "locality",
         "localadmin", "macrocounty", "county", "macrohood", "borough", "neighbourhood",
         "microhood", "disputed", "venue", "postalcode", "continent", "ocean", "marinearea"
        ]
      },
      "source_aliases": {
        "osm": [ "openstreetmap" ],
        "oa":  [ "openaddresses" ],
        "gn":  [ "geonames" ],
        "wof": [ "whosonfirst" ]
      },
      "layer_aliases": {
        "coarse": [
          "continent", "empire", "country", "dependency", "macroregion", "region", "locality",
          "localadmin", "macrocounty", "county", "macrohood", "borough", "neighbourhood",
          "microhood", "disputed", "postalcode", "continent", "ocean", "marinearea"
        ]
      }
    }
  },
  "schema": {
    "indexName": "pelias"
  },
  "logger": {
    "level": "debug",
    "timestamp": true,
    "colorize": true
  },
  "imports": {
    "adminLookup": {
      "enabled": true
    },
    "openaddresses": {
      "datapath": "/home/tim/data/openaddresses",
      "files": [
        "au/countrywide.csv"
      ]
    },
    "whosonfirst": {
      "datapath": "/home/tim/data/whosonfirst",
      "importPlace": "85632793",
      "importVenues": false
    }
  }
}
missinglink commented 6 years ago

Hi @timburgess, I had a look just now at our geocode.earth servers and I get the following results:

/v1/search?text=41 Thompson St Aitkenvale QLD Australia

1)  41 Thompson Street, Aitkenvale, QLD, Australia
2)  41 Thompson St, Aitkenvale, QLD, Australia

the top hit is from the au/countrywide file from openaddresses and the second is from the au/qld/statewide file from openaddresses.

.. which is odd because it looks like you're importing the country-wide file.

could you please try two things for me:

  1. Use the same query but spell out the state Queensland instead of QLD to see if it's a parsing error.
  2. Enable the placeholder service, I see in your logs placeholder service disabled, this additional parser will give you improved query matching.

The confidence scores all being 1 is a known issue, @orangejulius do you remember why this is the case?

missinglink commented 6 years ago

here's a copy of the results I'm seeing: https://gist.githubusercontent.com/missinglink/5969b4b407ec3a62167bd1b54ec5d3ac/raw/0660b7c0f9a90a1212673a0a3381c5c31d8fc6d5/api_issue_1164.json

Are these two results included in your resultset but lower down the list? I'm trying to figure out if it's a matching issue or a sorting issue.

timburgess commented 6 years ago

@missinglink Thanks! With the existing setup and query as above, I get 10 features back in the FeatureCollection object, but none of those are the two in your gist - which appear to me to be valid results.

1 - Trying http://108.61.96.7:3100/v1/search?text=41+Thompson+St+Aitkenvale+Queensland+Australia, I don't get them in the resultset. 2 - I haven't tried using placeholder yet. I'll set that up and see if I can get your result.

timburgess commented 6 years ago

I've got placeholder running as a service now and with api using that. I get the expected single record from my query: id au/countrywide:d9ee08a0d2958c89 :+1: