rustprooflabs / pgosm-flex

PgOSM Flex provides high quality OpenStreetMap datasets in PostGIS (Postgres) using the osm2pgsql Flex output.
MIT License
100 stars 20 forks source link

Performance with run-all-tags, run-no-tags, and run-road-place #12

Closed rustprooflabs closed 3 years ago

rustprooflabs commented 3 years ago

Details

I did some initial testing of PgOSM using Geofabrik's North America and Europe regions (from Geofabrik) to compare against testing here: https://blog.rustprooflabs.com/2019/10/osm2pgsql-scaling Used "Rig D" from the scaling post (Memory optimized droplet 128 GB RAM, 16 CPU, 400 GB SSD), and tested the two larger regions with the current PgOSM-flex (0.0.3) scripts. Previous testing with legacy output showed ~ 1 hour for N.America and ~ 4 hours for Europe.

Caveats

The "Scaling osm2pgsql" post only looked at the legacy osm2pgsql import and ignored the PgOSM transformations that I also had to run to prepare the data. Using osm2pgsql flex output combines the import and transformation into a single step. Depending on the area leading, the legacy PgOSM transformations added an additional 25-50% of the time of osm2pgsql.

PgOSM Flex 0.0.3 has (gut estimation) 60% of the layers defined of PgOSM Legacy. As more layers are added to PgOSM-Flex, it will (presumably) take longer.

PgOSM (Legacy) transformed data I never used.

Comparing the load time for an OSM extract form today against an extract from 14 months ago will always take longer. OSM data is ever-growing!

Versions Tested

Initial observations

North America w/ pgosm-flex run-all.lua took a bit over 3 hours, roughly 3x longer than base comparison. Europe with run-no-tags.lua took a bit over 9 hours. (note, screenshots show a total elapsed of roughly 11 hours for Europe, those include creating additional indexes and running pg_dump too.). Following screenshots show the vast majority of the process is stuck using a single CPU.

Screenshots from Processing Europe

From Digital Ocean monitoring dashboard.

load-osm-eu-flex--v0 0 3--CPU

load-osm-eu-flex--v0 0 3--IO

load-osm-eu-flex--v0 0 3--MEM

% disk used based on 400GB total.

Note: The drops at 09:00 and 11:30 were me manually deleting the N.America PBF, and then the osm_na schema, making room. By 13:00 I had a bet with myself that the processing stage would fail by running out of space (neared 100% around 14:30) but it squeaked by. Though, shame on me for not cleaning up my mess from prior testing before starting another round!

load-osm-eu-flex--v0 0 3--DiskUsage

Next steps

Need a bit more testing, timing, and exploration. Not sure yet how much improvement can be had in the Lua scripts to improve performance, what can be done in osm2pgsql, etc.

rustprooflabs commented 3 years ago

Resolved via #71. There will be more to do there but can deal with that as the project moves forward.