onthegomap / planetiler

Flexible tool to build planet-scale vector tilesets from OpenStreetMap data fast
Apache License 2.0
1.46k stars 116 forks source link

[FEATURE] Real-time OpenStreetMap updates? #47

Open msbarry opened 2 years ago

msbarry commented 2 years ago

Is your feature request related to a problem? Please describe. Planetiler was initially designed to work in batch mode but it might be possible to get incremental minutely updates from OpenStreetMap.

Describe the solution you'd like Run planetiler once to get a base tileset plus some extra indices, then run a continuous (or cron) process to crawl the latest OSM replication diffs (https://wiki.openstreetmap.org/wiki/Planet.osm/diffs) and apply them to the tileset. The initial run could be slower than batch mode, but should still be reasonable (<12 hours) and incremental updates should be <1 minute. Ideally, incremental updates should need much less RAM than a full import.

Describe alternatives you've considered Start from a planet.osm.pbf dump, then loop continuously:

This would result in a 1-3 hour latency, depending on how big of a machine you use.

Open Questions

bdon commented 2 years ago

Is there an API that can provide info for all of the nodes/ways/relations that need to be re-rendered when an related element changes (I.e. a node moves so we need to re-render ways that contain it)?

I've attempted this and it is very noisy for any small node change that influences ways > relations > super-relations. A general purpose mirror of raw OSM data I think is a good approach and exists for Java already with libraries like osm-lib.

msbarry commented 2 years ago

The basic approach here would be:

  1. keep a data store of the vector tile features rendered from each OSM element (currently we throw this away after generating the initial tileset)
  2. on each change, find all affected ways, relations, etc. and wipe features rendered from them, keeping track of the tiles they were in
  3. then insert features rendered from the new versions of the OSM element and elements that depend on it, keeping track of tiles they appear in
  4. then re-render every tile that was touched in step 2 or 3

Steps 2 and 3 would basically need random access to every osm element before and after the change...

I've attempted this and it is very noisy for any small node change that influences ways > relations > super-relations.

I thought that might be the case but my hope was the tile rendering would be fast enough to compensate, not sure if that theory would hold though.

Thanks for the pointer on osm-lib. I also spoke with the baremaps maintainer - they load the nodes, ways, and relations into a postgres database and can render tiles on the fly from that. That approach may make more sense for applications that need real-time updates. We might try to come up with a common format to describe tileset generation that could be shared between the two projects so you could get static or real-time tiles from the same spec.

msbarry commented 2 years ago

Another option is to only store intermediate vector tile features and render the tiles themselves on the fly when requested. The main downside here is that some very complex tiles could take 5+ seconds to render, for example some of these ones in Jakarta on a 2021 M1 Macbook Pro:

grischard commented 2 years ago

Would it speed things up to use the existing planet.mbtiles as a cache for subsequent runs? Then instead of throwing cpu/io at the problem, it's cache invalidation you deal with, which is much more fun.

wipfli commented 2 years ago

@grischard can you share how you would approach the problem using the existing planet.mbtiles file?

grischard commented 2 years ago

I see that @msbarry's basic approach idea from 14 January is basically that, but better - patch the mbtiles with new tiles. I was suggesting creating a new mbtiles but fishing out tiles that haven't changed from the old one.

msbarry commented 2 years ago

That is an interesting approach, so it would be something like:

?

That would take a bit longer for each update but eliminates some state we'd need to maintain between runs. I'd guess it might get us down in the ~15 minute range? It might be tricky to compute the set of changes tile ids from just the change files though (Ie a node moves, but it’s part of a way that spans many tiles)

ZeLonewolf commented 2 years ago

I'm wondering what optimizations might be possible if the .mbtiles was exploded into x/y/z pbfs (something that OWG is planning to do anyways). Perhaps the highest zooms could be run more frequently, similar to how it's done in osm-carto.

msbarry commented 2 years ago

@ZeLonewolf Are you saying they want to extract the tiles to files on disk?

We could try doing a planet generation with something like --minzoom=14. I think that would save quite a bit of io/cpu during tile generation.

ZeLonewolf commented 2 years ago

Yes exactly, extracted to disk and served statically. Turns out disk space is cheap.

msbarry commented 2 years ago

Ok, I had though about adding different output formats (pmtiles, files on disk) but didn't think anyone would want to deal with the 280 million files. If that's not the case we can include that as a native output format from planetiler.