Closed JosephKuchar closed 2 years ago
Please post your pelias.json
The admin info from OA is being discarded, we assign a consistent hierarchy with GIDs using point-in-polygon lookups during import (pip-service).
The final log lines you posted show that the PIP service failed to assign any admin info.
This could be for several reasons, I'd need to see your config to confirm.
Also worth spot checking the lat and lon values from OA are in the correct order, they've had bugs in the past with that.
You may also find the compare app useful for debugging and sharing queries: https://pelias.github.io/compare/#/v1/autocomplete?text=100+Ilsley+Avenue%2C+NS%2C+Canada
Thanks for the quick response! I've pasted the contents of the config file below (I cut out most of the lines of OA data, since there are about 150 of them). I had modified the Portland project to construct this one - I notice one thing I neglected to change is the focus point, is that relevant here?
{
"logger": {
"level": "info",
"timestamp": false
},
"esclient": {
"apiVersion": "7.5",
"hosts": [
{ "host": "elasticsearch" }
]
},
"elasticsearch": {
"settings": {
"index": {
"refresh_interval": "10s",
"number_of_replicas": "0",
"number_of_shards": "1"
}
}
},
"acceptance-tests": {
"endpoints": {
"docker": "http://api:4000/v1/"
}
},
"api": {
"services": {
"placeholder": { "url": "http://placeholder:4100" },
"pip": { "url": "http://pip:4200" },
"interpolation": { "url": "http://interpolation:4300" },
"libpostal": { "url": "http://libpostal:4400" }
},
"defaultParameters": {
"focus.point.lat": 45.52,
"focus.point.lon": -122.67
}
},
"imports": {
"adminLookup": {
"enabled": true
},
"blacklist": {
"files": [
"/data/blacklist/osm.txt"
]
},
"csv": {
"datapath": "/data/csv",
"files": [],
"download": [
"https://raw.githubusercontent.com/pelias/csv-importer/master/data/example.csv"
]
},
"geonames": {
"datapath": "/data/geonames",
"countryCode": "CA"
},
"openstreetmap": {
"download": [
{ "sourceURL": "https://download.geofabrik.de/north-america/canada-latest.osm.pbf" }
],
"leveldbpath": "/tmp",
"datapath": "/data/openstreetmap",
"import": [{
"filename": "canada-latest.osm.pbf"
}]
},
"openaddresses": {
"datapath": "/data/openaddresses",
"files": [
"nb_city_of_moncton.csv",
"on_northumberland.csv",
"on_oshawa.csv",
...
"ab_calgary.csv"]
},
"polyline": {
"datapath": "/data/polylines",
"files": [ "extract.0sv" ]
},
"whosonfirst": {
"datapath": "/data/whosonfirst",
"importPostalcodes": true,
"countryCode": "CA",
"importPlace": [
"85633041"
]
}
}
}
That all looks fine, specifically imports.adminLookup.enabled=true
.
You should definitely delete the api.defaultParameters
section you copied from Portland completely, although that's tangental.
There's something weird going on here...
So in the log you posted I would expect to see a line saying locality worker loaded...
with a decent number, I'm assuming that this line was present in the original log but truncated when you removed all the sources for brevity.
The main issue here is indicated by the {"calls":0,"hits":0,"misses":0}
lines, these indicate that none of the admin polygons loaded spatially intersected with any of the OA rows.
This is very unusual, my first intuition was that the admin polygons aren't being loaded correctly, but I can see that you have region worker loaded 13 features
so at very least we'd expect to see Provinces assigned.
So yeah, like I said before, it could be that the lat/lon values are funky in the OA data. There's two ways you can confirm this, firstly have a look at the GeoJSON response you're getting back, the Point
geometry dimension order is [lon, lat]
, check this is correct.
The other way to check is to locate your data directory on the host (identified by the DATA_DIR
env var) and go in there and into the OA directory and post the top ten lines of one of the OA files here, again I'm just sanity checking the lat/lon columns are defined the right way around.
One other thing which I just noticed is how you're defining your OA sources.
In the Portland project they look like this "us/or/portland_metro.csv"
but in yours they look like this "nb_city_of_moncton.csv"
, is that correct?
Shouldn't they look more like "ca/nb/city_of_moncton.csv"
?
You're right, there is a locality worker loaded
line I truncated,
...
info: [wof-pip-service:master] locality worker loaded 6172 features in 7.636 seconds
info: [wof-pip-service:master] PIP Service Loading Completed!!!
info: [openaddresses] Creating read stream for: /data/openaddresses/bc_city_of_courtenay.csv
...
The paths I specified are correct, I placed all the OA data into a single directory. Does Pelias expect a certain file structure? If so I can make it, but as is they're all in the same folder.
Maybe the path has been specified incorrectly? I just noticed that in the pelias configuration file I specify data/openaddresses
, but the DATA_DIR
is pelias-test/data/
- so is it interpreting it as data/data/openaddresses
? But the same is true of every other data source, and they all seem to have been correctly imported.
The CSVs for open address data look fine to me, here's an excerpt below. Lat and lon are properly defined.
,hash,number,street,unit,city,district,region,postcode,id,lat,lon
0,7a715522c0e3c266,34,Armitage Crescent,N,Ajax,,,,,43.8806185,-79.0361826
1,904f3c3a5e3dc4e4,36,Armitage Crescent,N,Ajax,,,,,43.8806576,-79.0360722
2,f52fbb7802e7b967,40,Armitage Crescent,N,Ajax,,,,,43.880759,-79.0358346
Well, I tried specifying the directory as openaddresses
instead of data/openaddresses
, and that resulted in a directory not found error, so I don't think it's a path problem. I also tested putting one of the files into the standard open addresses format (ca/ab/calgary.csv
), that didn't do anything either. It seems like the files are being read, but not being processed.
The CSVs for open address data look fine to me,
What's that additional column on the left with no column header?
What is the output of pelias elastic stats
?
Maybe the path has been specified incorrectly?
I suspect your paths are correct due to the log line info: [openaddresses] Importing 165 files.
The output from pelias elastic stats
is
{
"took" : 2188,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sources" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "openstreetmap",
"doc_count" : 14999792,
"layers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "street",
"doc_count" : 9760995
},
{
"key" : "address",
"doc_count" : 4211212
},
{
"key" : "venue",
"doc_count" : 1027585
}
]
}
},
{
"key" : "whosonfirst",
"doc_count" : 836626,
"layers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "postalcode",
"doc_count" : 809729
},
{
"key" : "locality",
"doc_count" : 23276
},
{
"key" : "neighbourhood",
"doc_count" : 2984
},
{
"key" : "county",
"doc_count" : 359
},
{
"key" : "macrohood",
"doc_count" : 119
},
{
"key" : "localadmin",
"doc_count" : 105
},
{
"key" : "borough",
"doc_count" : 40
},
{
"key" : "region",
"doc_count" : 13
},
{
"key" : "country",
"doc_count" : 1
}
]
}
},
{
"key" : "pelias",
"doc_count" : 3,
"layers" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "address",
"doc_count" : 1
},
{
"key" : "example_layer",
"doc_count" : 1
},
{
"key" : "with_custom_data",
"doc_count" : 1
}
]
}
}
]
}
}
}
As for the leading column in the CSVs, that seems to be an artefact from using geopandas and pandas to convert the geojsons to CSVs and forgetting to set index=False when I wrote out the csvs. I'll try again after removing that column.
Update: Removing the pandas default index column didn't change anything. I notice in the elastic output that openaddresses isn't listed at all.
It's quite frustrating trying to debug in a GitHub issue. Can you please clean up the code and open a draft PR to add a new Canada project, from there I can build your config and we can comment on the PR thread
Thanks for the help! I've actually solved it. This issue is closed, but I wanted to comment because this might apply to someone else in the future. I turned on the debugging option which gave me more info, and tested with just one file at a time. I saw that it was reading in the file, but skipping over all the lines, and giving these messages:
verbose: [openaddresses] number of invalid records skipped: 384170
info: [wof-admin-lookup] Shutting down admin lookup service
info: [wof-admin-lookup] Ensure your input file is valid before retrying
I looked into the codes in the pelias/openaddresses repo, and saw that all the column names it's referencing are capitalised, which is also my recollection of openaddress CSVs. However, it seems that the CSVs that were converted from geojsons didn't satisfy this. I just changed the column names to be capitalised, and now it seems to have run properly:
verbose: [openaddresses] number of invalid records skipped: 0
info: [wof-admin-lookup] Shutting down admin lookup service
info: [admin-lookup:worker] region worker process exiting, stats: {"calls":0,"hits":0,"misses":0}
info: [admin-lookup:worker] localadmin worker process exiting, stats: {"calls":1,"hits":0,"misses":1}
info: [admin-lookup:worker] borough worker process exiting, stats: {"calls":384170,"hits":0,"misses":384170}
info: [admin-lookup:worker] dependency worker process exiting, stats: {"calls":0,"hits":0,"misses":0}
info: [admin-lookup:worker] locality worker process exiting, stats: {"calls":384170,"hits":384169,"misses":1}
info: [admin-lookup:worker] continent worker process exiting, stats: {"calls":0,"hits":0,"misses":0}
info: [admin-lookup:worker] macrocounty worker process exiting, stats: {"calls":0,"hits":0,"misses":0}
info: [admin-lookup:worker] country worker process exiting, stats: {"calls":0,"hits":0,"misses":0}
info: [admin-lookup:worker] empire worker process exiting, stats: {"calls":0,"hits":0,"misses":0}
info: [admin-lookup:worker] neighbourhood worker process exiting, stats: {"calls":384170,"hits":382498,"misses":1672}
info: [admin-lookup:worker] macroregion worker process exiting, stats: {"calls":0,"hits":0,"misses":0}
info: [admin-lookup:worker] county worker process exiting, stats: {"calls":1,"hits":1,"misses":0}
info: [dbclient-openaddresses] paused=false, transient=0, current_length=0, indexed=384170, batch_ok=769, batch_retries=0, failed_records=0, address=384170, persec=2467
info: [dbclient-openaddresses] paused=false, transient=0, current_length=0, indexed=384170, batch_ok=769, batch_retries=0, failed_records=0, address=384170, persec=2467
info: [openaddresses] Total time taken: 79.107s
Thanks for your help!
agh cool, glad you solved it ;)
Describe the bug So I've recently completed a Canada-specific local implementation of Pelias. To do this I've collected all of the Canadian OpenAddresses data and stored it in DATA_DIR/openaddresses, and the
pelias import oa
step seems to have run successfully (at least, it creates data streams for all the sources and doesn't produce any errors). However, there is data that I know to be in openaddresses that Pelias is not finding. If, for example, I query "100 Ilsley Avenue, Dartmouth, NS", then this won't be returned by Pelias. This address actually is in the provincial level Nova Scotia data in OpenAddresses:299165,15c910dcb88ee5d9,100,Ilsley Ave,,Dartmouth,Halifax County,,,,44.699492,-63.587249
Interestingly, what can be found is the same street address but with no city specified.
I've tested other addresses that I've pulled sort of at random out of the openaddresses csvs, and Pelias returns fallback-type or interpolation matches from OSM instead of giving the exact matches that it theoretically should have available.
This is a sample of the output from running the data import step:
I appreciate any help!