pelias / pip-service

Pelias point-in-polygon-service
https://pelias.io
MIT License
16 stars 18 forks source link

Migrate from bundles to sqlite databases #105

Closed missinglink closed 5 years ago

missinglink commented 5 years ago

PR https://github.com/pelias/whosonfirst/pull/417 was recently merged which will allow the pip-service to load its data directly from sqlite databases instead of reading from disk.

Before we make this change in production its important that we test the start times of a full planet import and compare the two loading strategies.

If loading from sqlite is satisfactorily fast then we should go ahead and migrate :)

missinglink commented 5 years ago

related: https://github.com/pelias/wof-admin-lookup/pull/249

missinglink commented 5 years ago

I'll take this task

missinglink commented 5 years ago

loading planet via filesystem (old bundle data):

pip_1            | info: [wof-pip-service:master] starting with layers neighbourhood,borough,locality,localadmin,county,macrocounty,macroregion,region,dependency,country,empire,continent,marinearea,ocean
pip_1            | pip-service is now running on port 4200
pip_1            | info: [wof-pip-service:master] empire worker loaded 0 features in 0.028 seconds
pip_1            | info: [wof-pip-service:master] ocean worker loaded 7 features in 0.087 seconds
pip_1            | info: [wof-pip-service:master] borough worker loaded 138 features in 0.278 seconds
pip_1            | info: [wof-pip-service:master] dependency worker loaded 32 features in 0.273 seconds
pip_1            | info: [wof-pip-service:master] macroregion worker loaded 25 features in 0.313 seconds
pip_1            | info: [wof-pip-service:master] macrocounty worker loaded 23 features in 0.378 seconds
pip_1            | info: [wof-pip-service:master] marinearea worker loaded 305 features in 0.301 seconds
pip_1            | info: [wof-pip-service:master] continent worker loaded 8 features in 0.463 seconds
pip_1            | info: [wof-pip-service:master] country worker loaded 199 features in 3.928 seconds
pip_1            | info: [wof-pip-service:master] neighbourhood worker loaded 17726 features in 11.701 seconds
pip_1            | info: [wof-pip-service:master] region worker loaded 4268 features in 13.116 seconds
pip_1            | info: [wof-pip-service:master] county worker loaded 24845 features in 19.29 seconds
pip_1            | info: [wof-pip-service:master] localadmin worker loaded 99206 features in 34.649 seconds
pip_1            | info: [wof-pip-service:master] locality worker loaded 143249 features in 44.814 seconds
pip_1            | info: [wof-pip-service:master] PIP Service Loading Completed!!!

loading planet via filesystem (current data extracted from sqlite databases):

pip_1            | pip-service is now running on port 4200
pip_1            | info: [wof-pip-service:master] ocean worker loaded 7 features in 0.07 seconds
pip_1            | info: [wof-pip-service:master] dependency worker loaded 40 features in 0.227 seconds
pip_1            | info: [wof-pip-service:master] empire worker loaded 11 features in 0.251 seconds
pip_1            | info: [wof-pip-service:master] marinearea worker loaded 305 features in 0.302 seconds
pip_1            | info: [wof-pip-service:master] borough worker loaded 273 features in 0.494 seconds
pip_1            | info: [wof-pip-service:master] continent worker loaded 8 features in 0.539 seconds
pip_1            | info: [wof-pip-service:master] macrocounty worker loaded 477 features in 1.051 seconds
pip_1            | info: [wof-pip-service:master] macroregion worker loaded 94 features in 1.427 seconds
pip_1            | info: [wof-pip-service:master] country worker loaded 210 features in 4.103 seconds
pip_1            | info: [wof-pip-service:master] region worker loaded 4175 features in 14.492 seconds
pip_1            | info: [wof-pip-service:master] neighbourhood worker loaded 29313 features in 16.585 seconds
pip_1            | info: [wof-pip-service:master] county worker loaded 40775 features in 31.12 seconds
pip_1            | info: [wof-pip-service:master] localadmin worker loaded 112503 features in 37.186 seconds
pip_1            | info: [wof-pip-service:master] locality worker loaded 174756 features in 317.766 seconds
pip_1            | info: [wof-pip-service:master] PIP Service Loading Completed!!!

loading planet via sqlite databases (current data):

pip_1            | pip-service is now running on port 4200
pip_1            | info: [wof-pip-service:master] dependency worker loaded 40 features in 4.033 seconds
pip_1            | info: [wof-pip-service:master] marinearea worker loaded 305 features in 4.216 seconds
pip_1            | info: [wof-pip-service:master] borough worker loaded 273 features in 4.524 seconds
pip_1            | info: [wof-pip-service:master] continent worker loaded 8 features in 4.587 seconds
pip_1            | info: [wof-pip-service:master] empire worker loaded 11 features in 4.625 seconds
pip_1            | info: [wof-pip-service:master] ocean worker loaded 7 features in 4.69 seconds
pip_1            | info: [wof-pip-service:master] macrocounty worker loaded 477 features in 5.042 seconds
pip_1            | info: [wof-pip-service:master] macroregion worker loaded 94 features in 5.301 seconds
pip_1            | info: [wof-pip-service:master] country worker loaded 210 features in 7.926 seconds
pip_1            | info: [wof-pip-service:master] neighbourhood worker loaded 29313 features in 12.83 seconds
pip_1            | info: [wof-pip-service:master] region worker loaded 4175 features in 16.265 seconds
pip_1            | info: [wof-pip-service:master] county worker loaded 40775 features in 30.193 seconds
pip_1            | info: [wof-pip-service:master] localadmin worker loaded 112503 features in 32.32 seconds
pip_1            | info: [wof-pip-service:master] locality worker loaded 174756 features in 173.493 seconds
pip_1            | info: [wof-pip-service:master] PIP Service Loading Completed!!!
missinglink commented 5 years ago

here's another benchmark of the filesystem (current data extracted from sqlite databases) this is the same as the second one I posted above except this time the linux filesystem cache has seen these files.

pip_1            | info: [wof-pip-service:master] ocean worker loaded 7 features in 0.061 seconds
pip_1            | info: [wof-pip-service:master] dependency worker loaded 40 features in 0.245 seconds
pip_1            | info: [wof-pip-service:master] empire worker loaded 11 features in 0.337 seconds
pip_1            | info: [wof-pip-service:master] marinearea worker loaded 305 features in 0.341 seconds
pip_1            | info: [wof-pip-service:master] borough worker loaded 273 features in 0.539 seconds
pip_1            | info: [wof-pip-service:master] continent worker loaded 8 features in 0.522 seconds
pip_1            | info: [wof-pip-service:master] macrocounty worker loaded 477 features in 0.999 seconds
pip_1            | info: [wof-pip-service:master] macroregion worker loaded 94 features in 1.429 seconds
pip_1            | info: [wof-pip-service:master] country worker loaded 210 features in 4.161 seconds
pip_1            | info: [wof-pip-service:master] region worker loaded 4175 features in 14.743 seconds
pip_1            | info: [wof-pip-service:master] neighbourhood worker loaded 29313 features in 16.488 seconds
pip_1            | info: [wof-pip-service:master] localadmin worker loaded 112503 features in 38.204 seconds
pip_1            | info: [wof-pip-service:master] county worker loaded 40775 features in 42.219 seconds
pip_1            | info: [wof-pip-service:master] locality worker loaded 174756 features in 265.726 seconds
pip_1            | info: [wof-pip-service:master] PIP Service Loading Completed!!!
missinglink commented 5 years ago

Summary

Looking exclusively at the locality layer we can see that the filesystem method of loading the planet loads 143249 features in 44.814 seconds. This data is pretty old (the bundles haven't been updated for some time).

Using the sqlite method on a current database extract, it loads 174756 features in 173.493 seconds.

This is a huge difference, I wanted to know if it was due to the loading method or the data, so I deleted the bundle data and extracted the sqlite data to disk in the same directory structure as expected by the filesystem method.

On a cold start the filesystem method loading the current data achieves 174756 features in 317.766 seconds and a subsequent run can do 174756 features in 265.726 seconds, this is due to the linux filesystem cache helping out.

I tried running the sqlite method a couple more times to see if the filesystem cache could help and got pretty much the same results as above 174756 features in 179.186 seconds and 174756 features in 169.57 seconds.

It looks like work from the WOF team over the last ~6 months to add more data and improve geometries has negatively impacted the start-time performance, once the distributions are fixed and new bundles are created we can expect a 7x slowdown on start times using the filesystem method.

I'd suggest we switch over to sqlite as the default, we can likely make further speed improvements by adding more filter conditions in the sql and by use of PRAGMA statements which optimize read speed.

cc @orangejulius, @Joxit

Joxit commented 5 years ago

The SQLite supports is definitely a good move for pelias services :smile:

orangejulius commented 5 years ago

I'd say this is done. We have been using the sqlite option in production for geocode.earth for some time. It does load a bit slower, and I notice that the workers are no longer at 100% CPU while loading, so there may be some more optimizations we can make.

However, overall the process is much more stable, and having a single file instead of millions actually dramatically improves stability overall.

pravink commented 4 years ago

@orangejulius @missinglink

My PIP service took 1347.156 seconds to star.

below are the logs 2019-10-25T06:14:11.324Z - ESC[32minfoESC[39m: [wof-pip-service:master] locality worker loaded 174903 features in 1 1347.156 seconds 2019-10-25T06:14:11.641Z - ESC[32minfoESC[39m: [wof-pip-service:master] PIP Service Loading Completed!!!

Even after this I am seeing lot of missing locality data like below- wof-pip-service:filterOutCitylessNeighbourhoods] skipping 85796997: neighbourhood without locality or localadmin

My main concern is /reverse api doesnt return any locality/localadmin/postalcode details in the response. even though i do have admin,venue,postalcode data for whosonfirst [sqlite bundles]

Could you please let me know is there any reason that /v1/search API return locality/localadmin/postalcodes in the response but /v1/reverese doesnt.?

missinglink commented 4 years ago

Hi, @pravink I see that you opened an issue here https://github.com/pelias/pip-service/issues/114 with the same information you provided in your comment here.

Please be considerate of the humans volunteering their time for this project and avoid @ mentioning people, commenting on closed issues and duplicating questions, thanks!

pravink commented 4 years ago

Hi,

I initially thought it was good to comment here as the issue was closely related to this. But later i thought i should open an new tkt.. I ll keep in mind for the valuable time you guys spent on this project is really appreciated.

Thanks for your support and kind help!