pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
26 stars 42 forks source link

GEO-46: Pelias WOF data importer improvement to include geo-shape dat… #518

Closed ainemitch closed 3 years ago

ainemitch commented 3 years ago

GEO-46: Pelias WOF data importer improvement to include geo-shape data into Pelias ES index - patch applied, tests updated and new task added to generate additional error log.

ainemitch commented 3 years ago

Hi Julien,

Many thanks for your comments they are very helpful.

In answer to your question on the reason for wanting the duplicate values in name.default and name.${lang}, we are actually attempting to build a reverse geo-coding application which supports multiple languages and we are using requests similar to the below for retrieving the location data.

curl --location --request GET 'localhost:9200/pelias_world_1.0/_search' --header 'Content-Type: application/json' --data-raw '{ "query":{ "bool": { "must": { "match_all": {} }, "filter": { "geo_shape": { "polygon": { "shape": { "type": "point",
"coordinates" : [5.420364, 43.291582]
}, "relation": "contains" } } } } } }'

What we found that with the duplicate names filtered out in some cases the translated value returned was the variant name rather than the preferred name (in a comparison with a WhosOnFirst Spelunker search, for example for the locality Marseille

Without duplicates the French translation is 13000 (variant)

"bounding_box":"{\"min_lat\":43.1726492325,\"max_lat\":43.3910172471,\"min_lon\":5.22826669352,\"max_lon\":5.53250260623}", "name":{ "default":[ "Marseille", "MRS" ], .....

            "fr":"13000",

With the duplicates included we would get both the preferred and variant names which is what we want for our application.

            "fr":[
                "Marseille",
                "13000"
             ],

We will take a look at Pelias-conf and the OpenAddresses importer, thanks again for the information on those. I will close this pull request so and raise a more targeted ones with just the shape update. Thanks you again for taking the time to comment.