pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
112 stars 72 forks source link

"PeliasModelError: invalid regex test" during OSM extract import #534

Closed vicchi closed 4 years ago

vicchi commented 4 years ago

Describe the bug

During import of the GB OSM extract from Geofabrik a PeliasModelError: invalid regex test error and stack trace occurs. The import continues, apparently successfully.

Steps to Reproduce

  1. Install this repo, plus dependencies as per the documentation
  2. Local pelias.json is configured as below
  3. whosonfirst and geonames importers have already downloaded and run their respective download and import tasks
  4. Run npm run download
  5. Run npm start
  6. See error messages below ...
{
  "logger": {
    "level": "debug",
    "timestamp": false
  },
  "esclient": {
    "apiVersion": "7.7",
    "hosts": [
      { "host": "pelias" }
    ]
  },
  "elasticsearch": {
    "settings": {
      "index": {
        "refresh_interval": "10s",
        "number_of_replicas": "0",
        "number_of_shards": "1"
      }
    }
  },
  "api": {
    "services": {
      "pip": { "url": "http://pelias:4200" },
      "libpostal": { "url": "http://pelias:4400" },
      "placeholder": { "url": "http://pelias:4100" },
      "interpolation": { "url": "http://pelias:4300" }
    },
    "defaultParameters": {
      "focus.point.lat": 53.825564,
      "focus.point.lon": -2.421976
    },
    "targets": {
      "auto_discover": true
    }
  },
  "logger":  {
    "level": "debug"
  },
  "imports": {
    "adminLookup": {
      "enabled": true
    },
    "geonames": {
      "datapath": "/home/vagrant/data/geonames",
      "countryCode": "GB"
    },
    "openstreetmap": {
      "download": [
        { "sourceURL": "https://download.geofabrik.de/europe/great-britain-latest.osm.pbf" }
      ],
      "leveldbpath": "/tmp",
      "datapath": "/home/vagrant/data/openstreetmap",
      "import": [{
        "filename": "great-britain-latest.osm.pbf"
      }]
    },
    "whosonfirst": {
      "datapath": "/home/vagrant/data/whosonfirst",
      "importPostalcodes": true,
      "countryCode": "GB"
    }
  }
}

Expected behavior

No errors. Which is always nice. But the presence of a stack trace feels like a bug and that invalid data isn't being handled the best way?

Environment (please complete the following information):

Pastebin/Screenshots

2020-05-28T08:16:43.261Z - error: [openstreetmap] tag_mapper error
2020-05-28T08:16:43.262Z - error: [openstreetmap] PeliasModelError: invalid regex test, https://www.nottinghamcollege.ac.uk/about-us/locations/wheeler-gate should not match /https?:\/\//
    at Object.nomatch (/home/vagrant/pelias/openstreetmap/node_modules/pelias-model/util/valid.js:117:13)
    at Document.setAddress (/home/vagrant/pelias/openstreetmap/node_modules/pelias-model/Document.js:408:18)
    at DestroyableTransform._transform (/home/vagrant/pelias/openstreetmap/stream/tag_mapper.js:65:17)
    at DestroyableTransform.Transform._read (/home/vagrant/pelias/openstreetmap/node_modules/readable-stream/lib/_stream_transform.js:177:10)
    at DestroyableTransform.Readable.read (/home/vagrant/pelias/openstreetmap/node_modules/readable-stream/lib/_stream_readable.js:456:10)
    at flow (/home/vagrant/pelias/openstreetmap/node_modules/readable-stream/lib/_stream_readable.js:939:34)
    at DestroyableTransform.pipeOnDrainFunctionResult (/home/vagrant/pelias/openstreetmap/node_modules/readable-stream/lib/_stream_readable.js:749:7)
    at DestroyableTransform.emit (events.js:315:20)
    at onwriteDrain (/home/vagrant/pelias/openstreetmap/node_modules/readable-stream/lib/_stream_writable.js:479:12)
    at afterWrite (/home/vagrant/pelias/openstreetmap/node_modules/readable-stream/lib/_stream_writable.js:467:18)
2020-05-28T08:16:43.266Z - error: [openstreetmap] {
  "name": {},
  "phrase": {},
  "parent": {},
  "address_parts": {},
  "center_point": {
    "lon": -1.150378,
    "lat": 52.952435
  },
  "category": [],
  "addendum": {},
  "source": "openstreetmap",
  "layer": "venue",
  "source_id": "node/2198858524"
}

Additional context

References

missinglink commented 4 years ago

Thanks @vicchi, the 'error' is only really a warning, I agree its a bit too shouty, there is a PR open to fix it https://github.com/pelias/model/pull/122.

The root cause is that we put in some code to deal with incorrectly mapped names in OSM, to catch cases where mappers put URLs in one of the name fields 🤦

https://www.openstreetmap.org/node/2198858524

addr:housename = https://www.nottinghamcollege.ac.uk/about-us/locations/wheeler-gate

You can safely ignore PeliasModelError: invalid regex test log lines until that PR is merged.

vicchi commented 4 years ago

@missinglink (breathes virtual sigh of relief) ... thanks for the response. I will fret no more and also close this issue down. Cheers