I did some initial testing of PgOSM using Geofabrik's North America and Europe regions (from Geofabrik) to compare against testing here: https://blog.rustprooflabs.com/2019/10/osm2pgsql-scaling
Used "Rig D" from the scaling post (Memory optimized droplet 128 GB RAM, 16 CPU, 400 GB SSD), and tested the two larger regions with the current PgOSM-flex (0.0.3) scripts. Previous testing with legacy output showed ~ 1 hour for N.America and ~ 4 hours for Europe.
Caveats
The "Scaling osm2pgsql" post only looked at the legacy osm2pgsql import and ignored the PgOSM transformations that I also had to run to prepare the data. Using osm2pgsql flex output combines the import and transformation into a single step. Depending on the area leading, the legacy PgOSM transformations added an additional 25-50% of the time of osm2pgsql.
PgOSM Flex 0.0.3 has (gut estimation) 60% of the layers defined of PgOSM Legacy. As more layers are added to PgOSM-Flex, it will (presumably) take longer.
PgOSM (Legacy) transformed data I never used.
Comparing the load time for an OSM extract form today against an extract from 14 months ago will always take longer. OSM data is ever-growing!
Versions Tested
Ubuntu 20.04
osm2pgsql 1.4.0
PostgreSQL 13.1
PostGIS 3.1.0
Initial observations
North America w/ pgosm-flex run-all.lua took a bit over 3 hours, roughly 3x longer than base comparison. Europe with run-no-tags.lua took a bit over 9 hours. (note, screenshots show a total elapsed of roughly 11 hours for Europe, those include creating additional indexes and running pg_dump too.). Following screenshots show the vast majority of the process is stuck using a single CPU.
Screenshots from Processing Europe
From Digital Ocean monitoring dashboard.
% disk used based on 400GB total.
Note: The drops at 09:00 and 11:30 were me manually deleting the N.America PBF, and then the osm_na schema, making room. By 13:00 I had a bet with myself that the processing stage would fail by running out of space (neared 100% around 14:30) but it squeaked by. Though, shame on me for not cleaning up my mess from prior testing before starting another round!
Next steps
Need a bit more testing, timing, and exploration. Not sure yet how much improvement can be had in the Lua scripts to improve performance, what can be done in osm2pgsql, etc.
Details
I did some initial testing of PgOSM using Geofabrik's North America and Europe regions (from Geofabrik) to compare against testing here: https://blog.rustprooflabs.com/2019/10/osm2pgsql-scaling Used "Rig D" from the scaling post (Memory optimized droplet 128 GB RAM, 16 CPU, 400 GB SSD), and tested the two larger regions with the current PgOSM-flex (0.0.3) scripts. Previous testing with legacy output showed ~ 1 hour for N.America and ~ 4 hours for Europe.
Caveats
The "Scaling osm2pgsql" post only looked at the legacy osm2pgsql import and ignored the PgOSM transformations that I also had to run to prepare the data. Using osm2pgsql flex output combines the import and transformation into a single step. Depending on the area leading, the legacy PgOSM transformations added an additional 25-50% of the time of osm2pgsql.
PgOSM Flex 0.0.3 has (gut estimation) 60% of the layers defined of PgOSM Legacy. As more layers are added to PgOSM-Flex, it will (presumably) take longer.
PgOSM (Legacy) transformed data I never used.
Comparing the load time for an OSM extract form today against an extract from 14 months ago will always take longer. OSM data is ever-growing!
Versions Tested
Initial observations
North America w/ pgosm-flex
run-all.lua
took a bit over 3 hours, roughly 3x longer than base comparison. Europe withrun-no-tags.lua
took a bit over 9 hours. (note, screenshots show a total elapsed of roughly 11 hours for Europe, those include creating additional indexes and runningpg_dump
too.). Following screenshots show the vast majority of the process is stuck using a single CPU.Screenshots from Processing Europe
From Digital Ocean monitoring dashboard.
% disk used based on 400GB total.
Next steps
Need a bit more testing, timing, and exploration. Not sure yet how much improvement can be had in the Lua scripts to improve performance, what can be done in osm2pgsql, etc.