systemed / tilemaker

Make OpenStreetMap vector tiles without the stack
https://tilemaker.org/
Other
1.5k stars 232 forks source link

Enhancement: Generate multiple MBTiles files in a single run #762

Closed ssokol closed 1 month ago

ssokol commented 2 months ago

My bug title is pretty weak. Here's what I'm doing and what I'd like to be able to do...

I'm generating on MBTiles file for each US state using the north-america-latest.osm.pbf as the source. To do this, I generate 50+ config.json files, each of which are identical other than the bounding box. Then run a script to execute tilemaker once for each file.

It looks like there's an awful lot of initial processing that gets repeated for each run. Each time I see:

Layer place (z0-14)
Layer boundary (z0-14)
Layer poi (z12-14)
Layer poi_detail (z14-14) -> poi
Layer housenumber (z14-14)
Layer waterway (z8-14)
Layer waterway_detail (z12-14) -> waterway
Layer transportation (z4-14)
Layer transportation_name (z8-14)
Layer building (z13-14)
Layer water (z6-14)
Layer ocean (z0-14) -> water
Layer water_name (z14-14)
Layer water_name_detail (z14-14) -> water_name
Layer aeroway (z11-14)
Layer aerodrome_label (z10-14)
Layer park (z11-14)
Layer landuse (z4-14)
Layer urban_areas (z4-8) -> landuse
Layer landcover (z0-14)
Layer ice_shelf (z0-9) -> landcover
Layer glacier (z2-9) -> landcover
Layer mountain_peak (z11-14)
Bounding box -94.6179, 33.0041, -89.6444, 36.4996
Reading shapefile ocean
Reading shapefile urban_areas
Reading shapefile ice_shelf
Reading shapefile glacier
Generated points: 0, lines: 0, polygons: 32
Reading .pbf /Users/ssokol/Downloads/north-america-latest.osm.pbf
(Scanning for ways used in relations: 99%)           (1099 ms)
Block 250317/250318 (46079 ms)
SortedNodeStore: 184065 groups, 20072439 chunks, 2002529146 nodes, 11080229982 bytes (2.1% wasted)
Block 22056/22058 (497329 ms)
SortedWayStore: 20071 groups, 3384681 chunks, 135103822 ways, 1890701447 nodes, 5174545936 bytes

Followed by the generation of the specific tile file for the state being processed.

Would it be faster, more efficient, etc. to be able to feed tilemaker a configuration, a lua script, and a third file that contains a JSON array of objects with the destination filename and bounding box? Something like:

[
  {"outfile": "/mnt/san1/osmfiles/usa.alabama.mbtiles", bbox: [-88.473227,30.223334,-84.88908,35.008028]},
  {"outfile": "/mnt/san1/osmfiles/usa.arizona.mbtiles", bbox: [-114.81651,31.332177,-109.045223,37.00426]},
  Etc..
]

Sound reasonable? Already in there and I just missed it?

Thanks,

Steve

cldellow commented 2 months ago

Nope, you're not missing anything, this isn't part of tilemaker.

If memory serves--and your snippet seems to support this--tilemaker loads the entire PBF into memory, and the bounding box is only used to clip the output. That implies it'd be much faster to pay the loading cost only once.

If I recast your overall bug as "make this use case faster", I see a few options.

1) Patch tilemaker to do your initial proposal

This would be a little fiddly, but I think nothing like a major rewrite.

2) Patch tilemaker to make the PBF loading faster (more like GeoJSON/shapefiles)

Layers that are derived from shapefiles and GeoJSON files only load shapes that are within the clipped region. If PBF files could behave similarly, this could be a more general improvement that helps any user who uses a --bbox that clips out a significant portion of the PBF.

This is easy to do for shapefiles/GeoJSON files, because in those file formats, each shape is self-contained.

It's hard for OSM/PBF because OSM's model is highly normalized: a relation is made of up N ways. A way is made up of N nodes. The nodes, ways and relations are stored in separate parts of the PBF file.

Still, I suspect some improvement could be made here. I don't have a great sense of how much of an improvement could be had, though -- maybe it's only, like, 20%, which doesn't seem great relative to a scenario where you're running tilemaker 50x.

3) Change your workflow: generate north-america.mbtiles, then use a tool to split that into alaska.mbtiles, ohio.mbtiles, etc

This might be the path of least resistance -- no tilemaker changes, just a workflow change.

https://openmaptiles.org/docs/generate/create-custom-extract/#create-mbtiles-extract looks like a tool that would let you split the mbtiles files.

In practice, maybe you'd want to treat the 48 continental states like this, and then do your existing workflow for Alaska/Hawaii, to avoid doing all the work to create tiles for Canada and Mexico, just to throw them away.

4) Change your workflow: use state-specific .osm.pbf files

For example, Geofabrik publishes these here: https://download.geofabrik.de/north-america/us.html

If you want to be a good citizen and not hammer their download page on a daily basis, they also publish the .poly files that describe the region (e.g., this is Alabama), and you can use osmium-extract to slice up your north-america file yourself into each state.

ssokol commented 1 month ago

All good points. I had initially considered #4 but decided to go with the single input file thinking it would save me something.

I'm only planning on generating new base maps on a monthly basis, so it's not really a big deal no matter which approach I use.