pelias / wof-admin-lookup

Who's on First Admin Lookup for the Pelias Geocoder
https://pelias.io
MIT License
9 stars 24 forks source link

feat(localNames): prefer local names in lookup #318

Open JeremyBYU opened 11 months ago

JeremyBYU commented 11 months ago

:wave: I did some awesome work for the Pelias project and would love for everyone to have a look at it and provide feedback.


Here's the reason for this change :rocket:

This allows the PIP-Service to return administrative names in the local language. It seems this feature was started and pushed to the master branch in this commit in 2017. This 2017 commit involves adding a local boolean parameter called localizedAdminNames, which if set to true extracts the local name from the WOF dataset. However, I noticed some issues:


Here's what actually got changed :clap:


Here's how others can test the changes :eyes:

I tested this by updating PIP-Service to point to my fork. I then manually rebuilt the docker image. I then used the following pelias.json file:

{
  "logger": {
    "level": "info",
    "timestamp": false
  },
  "imports": {
    "adminLookup": {
      "enabled": true,
      "localizedAdminNames": true
    },
    "whosonfirst": {
      "datapath": "/data/whosonfirst",
      "countryCode": ["FR"],
      "importPlace": ["136253037", "85633147"]
    }
  }
}

and the following docker-compose.yml:

version: '3'
networks:
  default:
    driver: bridge
services:
  whosonfirst: # data download and prepartion
    image: pelias/whosonfirst:master
    container_name: pelias_whosonfirst
    user: "${DOCKER_USER}"
    volumes:
      - "./pelias.json:/code/pelias.json"
      - "${DATA_DIR}:/data"
  pip: # run-time container
    image: pelias/pip-service:master
    container_name: pelias_pip-service
    user: "${DOCKER_USER}"
    restart: always
    environment: [ "PORT=4200" ]
    ports: [ "127.0.0.1:4200:4200" ]
    volumes:
      - "./pelias.json:/code/pelias.json"
      - "${DATA_DIR}:/data"

When you start up pip it will take some time (~3-5 min) to build the in-memory spatial index. Afterward you can query the service as so: curl http://localhost:4200/2.320957/48.871326. You should see a response as so:

...
  "macroregion": [
    {
      "id": 404227465,
      "name": "Île-De-France", # without this feature it would be Ile-of-France
      "abbr": "IF",
      "centroid": { "lat": 48.709278, "lon": 2.503396 },
      "bounding_box": "1.446743,48.120537,3.558455,49.241062"
    }
  ]
...

We also did these tests on about 50K different GPS points in France and got a 90% exact match with our existing french-named locality lookup using a Nominatim instance. A sampling of the reaming 10% seem to be minor neighborhood differences (still in French).


Possible Concerns

We did not test the world. There could be edge cases where this breaks? I also have concerns about why the feature was not enabled in the first place. Maybe I missed something?