pelias / docker

Run the Pelias geocoder in docker containers, including example projects.
MIT License
338 stars 226 forks source link

FATAL ERROR: 74 regression(s) detected. #192

Closed getorca closed 4 years ago

getorca commented 4 years ago

Describe the bug After installing a fresh north american build I get the error FATAL ERROR: 74 regression(s) detected.

A number of the errors seem to be related to canada, at least the ones that don't just look like changes in names:

  ✘ regression [2] "/v1/search?sources=wof&text=Canada": score 1 out of 4
  diff:
    layer
      expected: country
      actual:   locality
    country
      expected: Canada
      actual:   Mexico
    country_a
      expected: CAN
      actual:   MEX
  ✘ regression [4] "/v1/search?text=22 Lloyd George Ave, Toronto Ontario CA": score 0 out of 5
  diff:
    layer
      expected: address
      actual:   county
    country_a
      expected: CAN
      actual:   USA
    locality
      expected: Toronto
      actual:   
    street
      expected: Lloyd George Ave
      actual:   
    housenumber
      expected: 22
      actual:   

Lib postal seems to be parsing things correctly

            "parsed_text": {
                "number": "22",
                "street": "lloyd george ave",
                "city": "toronto",
                "state": "ontario",
                "country": "CAN"
            }

But whosonfirst feature collection is wrong

 "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "geometry": {
                "type": "Point",
                "coordinates": [
                    -97.471799,
                    31.05239
                ]
            },
            "properties": {
                "id": "102087307",
                "gid": "whosonfirst:county:102087307",
                "layer": "county",
                "source": "whosonfirst",
                "source_id": "102087307",
                "name": "Bell County",
                "confidence": 0.4,
                "match_type": "fallback",
                "accuracy": "centroid",
                "country": "United States",
                "country_gid": "whosonfirst:country:85633793",
                "country_a": "USA",
                "region": "Texas",
                "region_gid": "whosonfirst:region:85688753",
                "region_a": "TX",
                "county": "Bell County",
                "county_gid": "whosonfirst:county:102087307",
                "county_a": "BL",
                "label": "Bell County, TX, USA"
            }

This seems to be the case for all of Canada. Canada is never returned as country or the wrong feature collection is returned for a different country.

Steps to Reproduce Install a fresh docker build of north america.

Expected behavior Canada being returned as the country.

Environment (please complete the following information):

getorca commented 4 years ago

I can confirm there is the same issue / bug on the planet build.

missinglink commented 4 years ago

Can you please post the complete output of the test suite as a gist or a pastebin including any errors?

missinglink commented 4 years ago

There was a major change to the WOF data downloads https://dist.whosonfirst.org/

The old ones were very old (>1yr), so this error could be due to:

Are there any other errors in the docker logs pelias compose logs which may be relevant?

Does this look to be isolated to Canada to you?

getorca commented 4 years ago

here is the pastebin for the test suite for a planet build:

https://pastebin.com/G60472XC

getorca commented 4 years ago

There was a major change to the WOF data downloads https://dist.whosonfirst.org/

I'm using the latest version from several days ago that uses geocode.earth

Are there any other errors in the docker logs pelias compose logs which may be relevant?

Possibly, I'm not sure how to debug or resolve the following:

Example 1: query: /v1/search?sources=wof&text=Canada logs:

placeholder_1    | took: 1.308ms
placeholder_1    | parent not found! locality_id -1
placeholder_1    | parent not found! continent_id -1
placeholder_1    | parent not found! country_id -1
placeholder_1    | parent not found! region_id -1
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/search?text=Canada&lang=eng HTTP/1.1 200 51606 - 51.811 ms
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/findbyid?ids=102191575%2C85632713%2C890455299%2C1360274397%2C85670669%2C85632323%2C421203405%2C1360154343%2C85671905%2C1360103365%2C85633293%2C102074151%2C1343941459%2C85686527%2C102078629%2C1343640541%2C85686665%2C102191581%2C85633735%2C1511678941%2C1343550883%2C85687349%2C1511678869%2C1327024295%2C85633129%2C404329281%2C1326861795%2C404227381%2C85682625%2C85633793%2C102083105%2C404522459%2C1326841517%2C85688543%2C102191577%2C85633009%2C102053019%2C1326716699%2C1511777415%2C85681931&lang=eng HTTP/1.1 200 49855 - 6.436 ms

Example 2: query: /v1/search?text=22 Lloyd George Ave, Toronto Ontario CA logs:

placeholder_1    | took: 23.761ms
placeholder_1    | parent not found! country_id 85633111
placeholder_1    | parent not found! region_id 85679209
placeholder_1    | parent not found! locality_id -1
placeholder_1    | parent not found! region_id 85672037
placeholder_1    | parent not found! region_id 85672037
placeholder_1    | parent not found! region_id 85679391
placeholder_1    | parent not found! region_id 85679299
placeholder_1    | parent not found! region_id 85675561
placeholder_1    | parent not found! region_id 85672229
placeholder_1    | parent not found! region_id 85679201
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/search?text=toronto%20ontario%20CAN&lang=eng HTTP/1.1 200 33675 - 57.119 ms
placeholder_1    | info: [placeholder] ::ffff:172.18.0.11 - GET /parser/findbyid?ids=102191581%2C85633147%2C102072139%2C136253037%2C404382871%2C101750149%2C404227825%2C1108826393%2C85683579%2C102191575%2C85633793%2C102087307%2C85688753%2C102191573%2C85632735%2C1108564375%2C1293330729%2C85675451%2C102191569%2C85632393%2C890461697%2C85632739%2C1108696521%2C85675177%2C85632203%2C102073305%2C1360169861%2C85672027%2C85632287%2C1108785237%2C1344175169%2C85671591%2C1108709601%2C1344101021%2C85632541%2C1092013489%2C1343955595%2C85670063%2C85632245%2C1092012045%2C1343639693%2C85669945&lang=eng HTTP/1.1 200 20791 - 11.664 ms

Does this look to be isolated to Canada to you?

On the North America build it definitely seemed to be related to Canada, the country isn't returned, also the same case for the planet build, maybe related to above.

getorca commented 4 years ago
placeholder_1    | ------------------------------------------------------
placeholder_1    | Database schema is out-of-date!
placeholder_1    | Your database files do not match the expected schema.
placeholder_1    | Please follow instructions in the README to obtain new database files.
placeholder_1    | This is the expected behaviour for breaking schema updates.
placeholder_1    | more info: https://github.com/pelias/placeholder
placeholder_1    | ------------------------------------------------------

Strange this is a new build. trying to run pelias prepare placeholder to see if that fixes it.

So looking at the code for pelias prepare placeholder it looks like it's building store.sqlite3 from wof.extract, where as the documentation, https://github.com/pelias/placeholder, tells you to download store.sqlite3 from https://data.geocode.earth/placeholder/store.sqlite3.gz. Am I correct to assume the docker prepare placeholder hasn't been updated, or the wof.extract or sql importer isn't correct for the new schema, and that's why the DB schema out of date error is happening on new builds?

getorca commented 4 years ago

Interesting downloading the store.sqlite3 file from geocode.earth seems to have solved some of the issues, but still getting 141 regressions errors vs 140 for planet, here's the latest pastebin, https://pastebin.com/SGhPXW75

It's solved the original Canada issue I was seeing, and the following test are much better:

Is the correct order to download, prepare and import each source documented? I believe the order matters as some depend on others?

missinglink commented 4 years ago

@getorca thanks for the detailed report, I've managed to find the source of the Canada issue and opened a PR to resolve it: https://github.com/pelias/wof/pull/13

I'll kick off a rebuild of all the Geocode Earth data downloads and they should be available in <1 day

missinglink commented 4 years ago

@getorca I've republished the data, please re-download and try again:

curl -O https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-ca-latest.db.bz2

lbunzip2 whosonfirst-data-admin-ca-latest.db.bz2

shasum -a256 whosonfirst-data-admin-ca-latest.db
b8bdb2a618927e921cb43debcabff1958e10afb438bdfb64af4e13138afc7bc3

sqlite3 whosonfirst-data-admin-ca-latest.db 'SELECT id, source, is_alt FROM geojson WHERE id = 85633041'
85633041|whosonfirst|0                               <--- this is the row which was missing
85633041|naturalearth-display-terrestrial-zoom6|1
85633041|naturalearth|1
85633041|quattroshapes|1
85633041|whosonfirst-reversegeo|1
getorca commented 4 years ago

thanks for your hard work, and helping me find the errors. It can be a bit intimidating with the long build times and number of sources/importers to debug. Trying now, will report back

getorca commented 4 years ago

Yup, solves the Canada issue.

missinglink commented 4 years ago

There was unfortunately another bug so I had to regenerate all the data again, should all be 💯 again now.

If the problem is solved for you please close the github issue. FYI we just recently started an OpenCollective, we are hoping to use the funds to hire someone part time to keep the community assets/code up-to-date.

getorca commented 4 years ago

There was unfortunately another bug so I had to regenerate all the data again, should all be again now.

If the problem is solved for you please close the github issue. FYI we just recently started an OpenCollective, we are hoping to use the funds to hire someone part time to keep the community assets/code up-to-date.

Yup, I will let you know ASAP, it might not be until later today or tomorrow. I destroyed the droplets I had the planet, and the NA build on, and I want to let my current import finish running.

missinglink commented 4 years ago

I'm going to close this issue due to inactivity.

I suspect that there may be some minor regressions remaining to the test suite related to the change of hosts for WOF dist files but I believe the critical errors have been resolved.

Thanks for your help in detecting the bugs.

Please feel free to open a new issue or let me know if this needs to be reopened.