Open gagandeepsingh1105 opened 5 months ago
Hi @gagandeepsingh1105, the 'administrative hierarchy' (ie. the city/province/country) of each record in Pelias is sourced exclusively from the WhosOnFirst dataset through point-in-polygon lookups at index time.
I believe this is a duplicate of https://github.com/pelias/csv-importer/issues/74
I'm not against adding this option to custom builds, the issue is that currently all administrative regions are composed of a source, id and term (with an optional abbreviation).
We could use 'custom'
as the source, but each admin region would need to have a unique id
in order to correctly generate the _gid
field.
An autoincrement value could work here but would have the disadvantage that two places in the same area would have differing parent IDs.
It's possible to have multiple associated 'parents' for a single layer, so for example a record can have multiple 'region' records associated.
The issue would be that we only return one (ie. the first one), so it would either need to be decided (or configurable) whether the record from the CSV file was returned, or the WOF one, in the case where both data sources returned a match.
Hello,
I am a developer on the original poster's team. I think this is an issue of how WOF is passed back as the first record returned, or how readily it is searched for a 'fallback' match, if a locality name is present despite a focus on a more granular location.
I performed the same two searches in the original post excluding the "sources=custom" filter from the API call and encountered the same behaviour. A search for "283 Prince Philip dr NL" (https://geocoder.alpha.phac.gc.ca/api/search?text="283%20prince%20philip%20dr%20NL") resulted in a match from the custom source with confidence 1.0.
However, a search for "283 Prince Philip dr St. John's NL" results in a match from WOF, and seemingly ignores a filter on the address layer type: https://geocoder.alpha.phac.gc.ca/api/search?text=%22283%20prince%20philip%20dr%20st%20john%27s%20nl%22 OR https://geocoder.alpha.phac.gc.ca/api/search?text=%22283%20prince%20philip%20dr%20st%20john%27s%20nl%22&layers=address
We'd like to use the custom data source in performing batch forward geocoding, and it is useful to pass an 'address, city, province' search term where the inclusion of the city helps refine the search. As identified in the original issue, this does not appear to be what is happening due to the inclusion of the city name.
We understand that WOF is the exclusive source for administrative hierarchy in Pelias, but the inclusion of the place name shouldn't cue the fallback behaviour when an accurate match to the desired layer granularity (street address) is available. In this scenario a street address supplemented by a city name should refine the area for a search, but it seems that it prompts a fallback match instead. It also seems to ignore a layer search filter in the API call when the city name is included, triggering the returned fallback result from WOF.
Thank you for your help!
The debug
query param displays a bunch more info:
https://geocoder.alpha.phac.gc.ca/api/search?text=%22283%20prince%20philip%20dr%20st%20john%27s%20nl%22&layers=address&debug=1
You can see that the Placeholder service ran, it found a matching locality
:
{
"controller:placeholder": [
{
"id": 890456615,
"name": "St. John's",
"placetype": "locality",
"population": 99182,
"lineage": [
{
"country": {
"id": 85633041,
"name": "Canada",
"abbr": "CAN",
"languageDefaulted": false
},
"county": {
"id": 1158869009,
"name": "Division No. 1",
"languageDefaulted": false
},
"locality": {
"id": 890456615,
"name": "St. John's",
"languageDefaulted": false
},
"region": {
"id": 85682123,
"name": "Newfoundland and Labrador",
"abbr": "NL",
"languageDefaulted": false
}
}
],
"geom": {
"bbox": "-52.72931,47.54494,-52.68931,47.58494",
"lat": 47.56494,
"lon": -52.70931
},
"languageDefaulted": false
}
]
}
Then when the Elasticsearch query is run, the ID of the locality matched above is added as a Filter condition (ie. mandatory condition):
{
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"terms": {
"parent.locality_id": [
"890456615"
]
}
}
],
"must": [
{
"terms": {
"layer": [
"address"
]
}
}
]
}
}
}
Of course this results in 0 hits:
{
"controller:search": {
"queryType": {
"address_search_using_ids": {
"es_took": 36,
"response_time": 42,
"retries": 0,
"es_hits": 0,
"es_result_count": 0
}
}
}
}
At this point there are zero matches, I forget the exact workflow here but I believe it falls back to a legacy search method which was more lenient.
I don't like that the request specifies only address
layers but returns other layers, this is likely a bug, but one which doesn't often occur outside of custom installations such as this.
The geometry of 890456615 St. John's is of type Point
, which explains why the address wasn't associated via the PIP service. (the address must lie inside the boundary)
Maybe for your usecase you can disable the Placeholder service, or possibly don't add any data to it?
I haven't tested it, but it might prevent the filter
condition being added to the elasticsearch query, which sounds like what you want.
@the-epeecurean are there better open geo data for that region?
the only one I can find is points only, does the CA govt publish something better than this? https://opendata.gov.nl.ca/public/opendata/page/?page-id=datasetdetails&id=265
@missinglink There are ... Statistics Canada publishes a hierarchy of delineated boundaries. I've just been evaluating some cherry-picked WOF 'fallback' results we've been seeing in testing.
Here's a link to an open REST point for the collected Cartographic Boundary files published by Statistics Canada: https://geo.statcan.gc.ca/geo_wa/rest/services/2021/Cartographic_boundary_files/MapServer
And a reference to descriptions of the Cartographic Boundary files made available (at the bottom under "1. Spatial information products"): https://www150.statcan.gc.ca/n1/pub/92-196-x/92-196-x2021001-eng.htm
A polygon for the example cited in the Issue above (St. John's NL) appears at the CSD (census subdivision) and CMA (census metropolitan area) levels. However, some smaller localities (within a larger CMA, e.g., Halifax, NS) show up as polygons in the DPL (designated place) boundary file.
If there is any way that we could help in facilitating this spatial information being included in WOF, please let us know. It would help our usecase greatly to see a broader capture of localities in Canada represented as polygons.
Adding an issue upstream in Who's On First to help facilitate this work:
tl;dr the new 2021 cartographic boundary files from Stats Canada look great and we'd love to import them!
Hi there,
I am an engineer at Public Health Agency of Canada. We currently have a use case for which we are looking to deploy an instance Pelias Geocoder. For this use case, we have some custom input data(a csv file) of Canada locations only and we want to use Pelias Geocoder's forward geocoding to convert the text address to longitudes and latitudes. And for this reason we are trying to deploy csv-importer. Below is the snapshot of input data that we have ingested into our elastic search instance:
While using forward geocoding if we supply street number, street name and province , then the api returns the response with confidence level =1 and source =custom:
Api request: https://geocoder.alpha.phac.gc.ca/v1/search?text="283 prince philip dr nl"&sources=custom
But if we also include the city name in the input text, then the confidence level drops to 0.6 and the match type changes to fall back. As you may have already noted that we do have a column named 'city' in our input data but somehow csv-importer is not able to read it and falls back to whosonfirst data source.
We have tried a couple of things at our end to resolve this issue: 1) In the pelias.json configuration file , we added a "docs" key to map the columns in the csv file with those in pelias schema but got the following error:
Snapshot of pelias.json file: "csv": { "datapath": "/data/csv-importer-files", "files": ["NLFD_test_changed.csv"], "docs": [ { "name": "LAT", "type": "number", "required": true }, { "name": "LON", "type": "number", "required": true }, { "name": "SOURCE", "type": "number", "required": true }, { "name": "LAYER", "type": "number", "required": true }, { "name": "NUMBER", "type": "string", "required": false, "es_field": "address.number" }, { "name": "STREET", "type": "string", "required": false, "es_field": "address.street" }, { "name": "CITY", "type": "string", "required": false, "es_field": "address.city" }, { "name": "NAME", "type": "string", "required": false, "es_field": "address.name" }, { "name": "MAIL_PROV_ABVN", "type": "string", "required": false, "es_field": "address.region" }, { "name": "POSTALCODE", "type": "string", "required": false, "es_field": "address.postalcode" } ], "download": [] }
2) Also, tried to give the column mapping in a separate file but that too didn't work and got the same error again
Snapshot of pelias.json file { "imports": { "csv": { "datapath": "/data", "files": [ "canada-locations.csv" ], "mappings": "/code/csv_mapping.json" } } }
and then defined the column mappings in a separate file: { "mappings": { "id": "id", "latitude": "latitude", "longitude": "longitude", "number": "house_number", "street": "street", "city": "city", "region": "region", "province": "province", "country": "country", "postalcode": "postalcode", "category": "category", "name": "name", "layer": "address" } }
Steps to Reproduce 1) Deploy an instance of Pelias Geocoder with csv-importer running 2) Make the above mentioned configuration changes in pelias.json file. 3) Try the following Api calls: https://geocoder.alpha.phac.gc.ca/v1/search?text="283 prince philip dr nl"&sources=custom https://geocoder.alpha.phac.gc.ca/v1/search?text="283 prince philip dr st john's nl"&sources=custom
Expected behavior Including city name in the search text should also give confidence=1 and source=custom
Environment (please complete the following information): We are currently running an instance of Pelias Geocoder on a kubernetes cluster on Google Cloud Platform
Please do let us know in case you require any additional information to debug this issue. Thanks in advance.