pelias / pip-service

Pelias point-in-polygon-service
https://pelias.io
MIT License
16 stars 18 forks source link

Configuration to limit the number of layers loaded on start #122

Open CharlesG-Branch opened 4 years ago

CharlesG-Branch commented 4 years ago

Use-cases

Starting up the pip-service can take an extremely long amount of time even when limiting what you want via importPlace. Sometimes certain layers aren't needed & it would be nice to disable the loading of them to improve that startup time. From what I've seen locality and localadmin in particular take much longer then the other layers.

Proposal

Pass in the layers you want loaded into here https://github.com/pelias/pip-service/blob/master/app.js#L95

As far as I can tell, this feature is already supported in wof-admin-lookup:

I'm happy to implement this myself file a PR, I just need to know what the procedure for updating the config is since it's shared across all projects for pelias + since this project currently doesn't load the config.

missinglink commented 4 years ago

Hi @CharlesG-Branch, we have been working on a new system which starts up instantly, would you be interested in BETA testing that?

If you were to remove locality from the list of layers this would have a negative effect on quality since address data would no longer be associated with a locality. Could you please explain more about your specific use-case that this wouldn't matter?

CharlesG-Branch commented 4 years ago

@missinglink I'd be happy to beta test it & I'm curious how that was accomplished (is there a runtime perf hit?)

In my case I'm only deploying this service without the rest of the pelias stack as I only need the reverse geocoding component. And only the layers for counties and larger are important for my case & so I figured that locality would be safe to not load then as it's lower in the hierarchy https://github.com/whosonfirst/whosonfirst-placetypes — will not loading it impact the accuracy of things higher in the hierarchy?

missinglink commented 4 years ago

Well then you're going to love this...

curl -s https://data.geocode.earth/wof/dist/spatial/whosonfirst-data-admin-us-latest.spatial.db.bz2 | lbunzip2 > whosonfirst-data-admin-us-latest.spatial.db
docker run --rm -it -v "${PWD}:/data" -p 3000:3000 pelias/spatial server --db=/data/whosonfirst-data-admin-us-latest.spatial.db

There is a demo on port 3000

missinglink commented 4 years ago

Try out these paths locally:

GET /explore/pip#14/37.785240/-122.424624
GET /query/pip?lon=-122.42457937449218&lat=37.78471707419765&role=boundary
GET /query/pip/verbose?lon=-122.42457937449218&lat=37.78471707419765&role=boundary
GET /query/pip/_view/pelias/-122.42457937449218/37.78471707419765

With the last of these being a 'reverse compatible' endpoint with this repo, although that's where the BETA comes in. I would appreciate your feedback.

missinglink commented 4 years ago

The magic here is that the data is loaded in mmap mode so the linux filesystem cache provides an in-memory LRU cache for the 'hot pages', you don't need to configure anything but the more memory you have the moar faster it is, I can explain more if you find it useful.

CharlesG-Branch commented 4 years ago

Wow, the startup time & demo page are incredible. Exposing the localization information is also extremely helpful.

I did get some exceptions for the last two links:

2020-04-09T19:44:59.758Z - info: [geometry] ::ffff:172.17.0.1 - GET /query/pip/_view/pelias/-122.42457937449218/37.78471707419765 HTTP/1.1 500 1018 - 17.145 ms
TypeError: Cannot read property 'split' of null
    at rows.forEach.row (/code/server/routes/pip_verbose.js:29:33)
    at Array.forEach (<anonymous>)
    at Object.module.exports (/code/server/routes/pip_verbose.js:28:8)
    at module.exports (/code/server/routes/pip_pelias.js:14:33)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at next (/code/node_modules/express/lib/router/route.js:137:13)
    at Route.dispatch (/code/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at /code/node_modules/express/lib/router/index.js:281:22
    at param (/code/node_modules/express/lib/router/index.js:354:14)
2020-04-09T19:45:11.456Z - info: [geometry] ::ffff:172.17.0.1 - GET /query/pip/verbose?lon=-122.42457937449218&lat=37.78471707419765&role=boundary HTTP/1.1 500 1033 - 23.600 ms
TypeError: Cannot read property 'split' of null
    at rows.forEach.row (/code/server/routes/pip_verbose.js:29:33)
    at Array.forEach (<anonymous>)
    at module.exports (/code/server/routes/pip_verbose.js:28:8)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at next (/code/node_modules/express/lib/router/route.js:137:13)
    at Route.dispatch (/code/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/code/node_modules/express/lib/router/layer.js:95:5)
    at /code/node_modules/express/lib/router/index.js:281:22
    at Function.process_params (/code/node_modules/express/lib/router/index.js:335:12)
    at next (/code/node_modules/express/lib/router/index.js:275:10)

I'll play around with it loading + using the full wof dataset later today. Limiting the placeids loaded (currently done with imports.whosonfirst.importPlace) might still be useful since it'll prevent unneeded places from filling up the cache — tho the cost from failing to find may just be more. I'll have to check.

missinglink commented 4 years ago

Looks like a bug thanks, easily fixed. I'm opening up https://github.com/pelias/spatial/issues/47 for further feedback, please add any more beta testing notes over there so I can track them in one place.

missinglink commented 4 years ago

More download options from our site https://geocode.earth/data

missinglink commented 4 years ago

If y'all would like commercial support we'd be happy to supply other data such as OSM and US CENSUS data for your business as seen in our demo https://spatial.demo.geocode.earth/explore/pip

missinglink commented 4 years ago

bug resolved in https://github.com/pelias/spatial/pull/48

bradjones1 commented 2 years ago

FWIW I came to this issue after having serious performance issues starting pip-service in development (using the shipped Docker image.) Spatial does the trick and starts immediately. I generally find the documentation on what datasets are applicable to which products and why, and the proper way to import really confusing. BUT, with the examples in issues and by reading the code I was able to make it work. Thanks for making these projects open-source. I would recommend anyone needing PIP to go straight to Spatial.