pelias / whosonfirst

Importer for Who's on First gazetteer
MIT License
27 stars 42 forks source link

Error: invalid regex test #456

Closed vrozental closed 5 years ago

vrozental commented 5 years ago

The error appears while running npm start in the whosonfirst on Ubuntu 18.04:

2019-06-30T21:53:33.579Z - error: [whosonfirst] doc generator error: invalid regex test, http://en.wikipedia.org/wiki/Narsdorf should not match /https?:\/\//       
2019-06-30T21:53:33.579Z - error: [whosonfirst] {                                                                                                                                               "id": 1125331493,                                                                                                                                                 
  "name": "Narsdorf",                                                                                                                                                                           "name_aliases": [],                                                                                                                                              
  "name_langs": {                                                                                                                                                                                 "li": [                                                                                                                                                         
      "http://en.wikipedia.org/wiki/Narsdorf"                                                                                                                                                     ]                                                                                                                                                               
  },                                                                                                                                                      
  "place_type": "localadmin",                                                                                                                               
  "lat": 51.0167,                                                                                                                                               
  "lon": 12.7167,                                                                                                                                                                               "bounding_box": "12.7167,51.0167,12.7167,51.0167",                                                                                                                                            "population": 1707, 
  "hierarchies": [                                                                                                                                                                                {                                                                                                                                                                                               "continent_id": 102191581,
      "country_id": 85633111,                                                                                                                                                                       "county_id": 102064227,                                                                                                                                                                       "localadmin_id": 1125331493,
      "region_id": 85682523                                   
    }                                                                                                                                                                                           ]                                                                                                                                                                                           }
missinglink commented 5 years ago

@vrozental you can ignore these 'errors', they are indicating that the document had an URL when we were expecting a name. The actual error here is that these warnings are too verbose in the logging which makes people assume there was an import error when there wasn't.

duplicate of https://github.com/pelias/docker/issues/89 and https://github.com/pelias/polylines/issues/216.

vrozental commented 5 years ago

Thank you @missinglink

stepps00 commented 5 years ago

I saw this issue come through and confirmed that the record no longer contains this bunk name property.

@missinglink - this test seems very useful.. can you point me to where these tests live?

missinglink commented 5 years ago

@stepps00 we actually introduced the regex to catch bad data in OSM but it seems to catch errors in all datasets.

The test itself is in pelias/model, it's a simple regex test /https?:\/\// to check a 'name' property doesn't start with http(s)://