onthegomap / planetiler

Flexible tool to build planet-scale vector tilesets from OpenStreetMap data fast
Apache License 2.0
1.39k stars 111 forks source link

Support Overture Map format #636

Closed msbarry closed 4 months ago

msbarry commented 1 year ago

The first Overture Map release using the new format is out: https://overturemaps.org/download/overture-july-alpha-release-notes/ and there are quite a few differences from OpenStreetMap data formats that planetiler currently supports:

This presents some challenges (like writing new profiles, adding support for parquet sources) but also some opportunities:

What do people think the ideal workflow would be to use Overture maps data from Planetiler?

msbarry commented 1 year ago

Some initial observations on the dataset:

It contains 1.5B elements (compared to 1.2B for OSM if you exclude nodes with no tags)

The initial size is 215GB vs. 75GB for OSM

Size breakdown by theme and type:

theme type Size (GB)
theme=admins type=administrativeBoundary 0.1
theme=admins type=locality 0.5
theme=buildings type=building 118.3
theme=places type=place 8.6
theme=transportation type=connector 21.1
theme=transportation type=segment 66.9
msbarry commented 1 year ago

I wrote a sample Java driver using these dependencies:

org.apache.parquet:parquet-avro:1.13.1
org.apache.hadoop:hadoop-client:3.3.6

They double the size of the planetiler fat jar file (from 60 to 120MB) šŸ˜­

On an r7g.metal instance, it takes ~1m30s to download all of the parquet files to disk, then about 1m30s to read all of the 1.5B geometries out of it.

Even though the IDs are 128-bit longs (and some are temp ID strings) - if you FNV1A-hash them to 64 bits, there are no collisions.

acalcutt commented 1 year ago

It seems the majority of the space is buildings, are there a lot more buildings than OSM or is the size just the format used?

bdon commented 1 year ago

there should be many more buildings than OSM, if this is a superset of previous Daylight MSFT building releases.

The Parquet format is very efficient at encoding column data but the geometry encoding is simply WKB, which should be less compact than the OSM topological way/node model. So I think it's a combination of both.

jenningsanderson commented 1 year ago

Exactly, currently the buildings theme contains a combination of OSM + Microsoft ML + ESRI Community Maps data, for a total of about 786M buildings globally.

msbarry commented 1 year ago

A couple of other options for reading parquet format:

wipfli commented 1 year ago

Probably this is stupid so feel free to ignore my comment, but my first thought was let's flatten the nested properties and make a planet.pbf file which looks like OSM...

wipfli commented 1 year ago

@mactrem gave a nice presentation about possible improvements of the Mapbox Vector Tile format in the last MapLibre Technical Steering Committee meeting. There was some discussion about nested properties. Maybe this is a bit far in the future, but worth thinking about...

If you are interested in discussions around the tile format, feel free to join the #maplibre-tile-format channel in the OSMUS slack.

msbarry commented 1 year ago

I got a little deeper into the low-level parquet format reading this morning. It looks like it should actually work pretty cleanly in planetiler architecture to saturate all cores if I read one row group from a file at a time, then hand it off to a worker to parse and process one element at a time.

I tried playing with https://github.com/joelittlejohn/jsonschema2pojo to generate typed classes from the json schema definition in https://github.com/OvertureMaps/schema/tree/main/schema but looks like it can't handle allOf/oneOf. I'll play with it a bit more but might just start with a dynamic API like getInt("level") or getDouble("bbox.minx") instead of marshaling it into a typed wrapper for the first pass.

bdon commented 1 year ago

Probably this is stupid so feel free to ignore my comment, but my first thought was let's flatten the nested properties and make a planet.pbf file which looks like OSM...

It should be possible but it should definitely use https://github.com/onthegomap/planetiler/issues/127 - would be highly inefficient to break out Overture into a topological node/ways/relations just to re-assemble them again.

msbarry commented 1 year ago

Alright, got a first pass (highly experimental!) overture reader up and running. Planetiler can download and build a planet.pmtiles for the planet on an r7g ec2 instance in about 15 minutes, compared to 20+ for an OSM planet pbf. The output is around 50GB.

Here's a demo: https://msbarry.github.io/planetiler-overture-demo/#14/42.35647/-71.07003

The structured attributes definitely present a challenge mapping to vector tile key/value pairs, mostly on road segment layer. Take a look at the road "segment" attributes, I just left them JSON for now but I'll need to do something to split up road lines and apply tags conditionally for different segments with attributes like flags=[{"value":["isBridge"],"at":[0.040321064,0.448427838]}].

wipfli commented 1 year ago

Congratulations @msbarry, this is amazing!

acalcutt commented 1 year ago

Very cool @msbarry

Looking at it, it doesn't seem like it would be hard to style those values with a case syntax. similar to the ugly way I did the icons in my recent trail map, like "icon-image":["case",["in", "Fuel", ["get","description"]],"fuel_15", ["case",["in", "Parking", ["get","description"]],"parking_15", ["case",["in", "View", ["get","description"]],"attraction_15", ["case",["in", "Sales", ["get","description"]],"commercial_15", ["case",["in", "Camping", ["get","description"]],"campsite_15", ["case",["in", "Food", ["get","description"]],"restaurant_15", ["case",["in", "Lodging", ["get","description"]],"hotel_15", ["case",["in", "Restroom", ["get","description"]],"toilets_15", ["case",["in", "Club", ["get","description"]],"warehouse_15", "circle_15"]]]]]]]]],

What does that "at" location mean? does the line start at that point or is it longer and telling you at a certain point it changes to a different style?

I wonder how this looks side by side with a map made from OSM sources. it seems like OSM is the source for a lot of this anyway.

msbarry commented 1 year ago

The at field means that the attribute only applies for a certain segment of the line, so at: [0, 0.5] means it applies for the first half. This is one of the biggest mismatches between overture format and planetiler processing/vector tiles in general.

So to "have support for overture" in planetiler probably means it is able to:

I'd say understanding the current overture schema could be out of scope for now, since it will evolve and we people should be able to use new attributes without being blocked on a planetiler update.

What do people think?

msbarry commented 1 year ago

OK I got a prototype profile with those working (see code and demo)

I think it's easiest to work with the structured schema with a dynamic API, so you can do things like:

feature.setAttr("categories.main", struct.get("categories").get("main").asString())

or handle all of the different ways that partial-length road data is provided (some embed an "at": [start, end] field in an object, some use {"at": [start ,end]: "value": list or value} some use "values" but it's all handled by this code)

For actually handling the partial-length values I came up with an API

var rangeMap = new RangeMapMap();
rangeMap.put(0, 0.25, Map.of("key", "value");
rangeMap.put(0.25, 1.0, Map.of("key", "other value");
var lineSplitter = new LineSplitter(lineString);
for (var range : tags.result()) { // merges overlapping tag maps
  var splitLine = lineSplitter.get(range.start(), range.end());
  features.geometry(sourceFeature.getSourceLayer(), splitLine)
    .putAttrs(range.value());
}

but I could probably simplify it to something like:

features.line(sourceFeature.getSourceLayer())
  .setAttrPartialLength(0, 0.25, "key", "value")
  .setAttrPartialLength(0.25, 1.0, "key", "other value")
  .putAttrsPartialLength(0, 0.5, names)

then have planetiler handle creating multiple line geometries behind the scenes.

acalcutt commented 1 year ago

If I wanted to try the code at https://github.com/onthegomap/planetiler/tree/overture-generic , how would I use this new profile after compiling planetiler? can it only be used with pmtiles or are mbtiles still possible?

msbarry commented 1 year ago

If I wanted to try the code at https://github.com/onthegomap/planetiler/tree/overture-generic , how would I use this new profile after compiling planetiler? can it only be used with pmtiles or are mbtiles still possible?

java -jar planetiler.jar overture

should be sufficient. It will download by default, but you can set --download=false --overture-path=... to point to a location you've already downloaded to. You can also set --split_roads=true to spit road segments (default just leaves the json structs on each full-length road segment) and --connectors=false will disable writing transportation connectors and connector IDs to the output (which double the size). You can write to pmtiles with --output=planet.pmtiles or --output=planet.mbtiles. You can set --bounds=minlon,minlat,maxlon,maxlat to generate map for a bounding box - this runs pretty fast because it's able to use a column predicate to avoid reading/parsing entire rows outside of the box.

msbarry commented 1 year ago

Also, I'm not sure if we should add full overture support to planetiler while they are still only doing alpha releases if the format might change in the future (for example something besides avro-parquet). So maybe we should split this out into separate independent issues for the generic lower-level capabilities that planetiler needs to work with overture-like data:

Then we could have an example profile that used these to read one of the alpha releases but it's mostly up to consumers if/how they want to use it? Most likely I think people would want to pick individual themes from an overture release to layer-into another map profile.

msbarry commented 5 months ago

Added initial geoparquet support in #888. There's a still some rough edges and I'm planning to improve support for structured attributes, automatic downloading, linear-referenced tags, exposing row/column filters, and geoarrow improvements - but it's at least possible to use overture data now.

vcschapp commented 4 months ago

The at field means that the attribute only applies for a certain segment of the line, so at: [0, 0.5] means it applies for the first half. This is one of the biggest mismatches between overture format and planetiler processing/vector tiles in general.

So to "have support for overture" in planetiler probably means it is able to:

  • download and read parquet sources (realistically people will probably want to mix and match only certain themes from overture with other sources)
  • access structured properties (nested structs and lists) and parse json strings, assuming this isn't a bug: road field is just a JSON stringĀ OvertureMaps/data#43
  • break apart lines based on attributes that only apply to partial segment lengths

I'd say understanding the current overture schema could be out of scope for now, since it will evolve and we people should be able to use new attributes without being blocked on a planetiler update.

What do people think?

Just a quick note that we're moving forward with fixing the "road field is just a JSON string" issue. (I commented there in a bit more detail).

vcschapp commented 4 months ago

@msbarry I'm curious about the relative performance of tiling GeoParquet versus OSM. I saw further up you mentioned 15 minutes for GeoParquet versus 20+ for OSM. Is the difference that small, or is it partly influenced by data size, e.g. Overture has roughly 4X the buildings?

msbarry commented 4 months ago

It's a little hard to isolate since the volume of data in OSM vs. overture is different, and there's no profile that works with both overture and OSM data yet. I tested with a noop profile that just does a full scan through latest OSM pbf and 5/16 overture release on a 32-core machine I've been using to test and they each took around 10-11 minutes - although Overture has about 3x as many features (3.4B vs. 1.2B for OSM excluding nodes without tags).

OSM pbf has to do 2 full scans, storing node locations on the first pass and using them to construct geometries on the second pass - it also needs 64gb of ram to cache node location lookups in the second pass. Overture provides fully-formed geometries so the reader can get by with only 2-3gb or ram.

That's just for a full scan though - if you specify a bounding box then planetiler skips parquet files where the geoparquet bbox metadata field doesn't intersect, and converts the bbox to a push-down predicated that let it read all of Boston or Massachusetts in 5-10 seconds. I think people will most likely grab only the themes/types they need for their profile so the amount of data processed should be smaller. And when I add support for column filters that should help too.

msbarry commented 4 months ago

Here's a rough breakdown of the 10-11 minutes to do a full scan through overture:

The time spent just reading overture data will likely be in the noise compared with time spent processing geometries/emitting tiles (same for osm data though).

four43 commented 4 months ago

@msbarry - first off these updates are incredible - thanks for taking the time to put these together.

Also - those performance metrics seem incredibly fast across the board. For a whole globe of data, those render times are so fast! Do you have a target you're shooting for? Does your use case require such rapid iteration? I'm curious more so than anything.

Thanks again, looking forward to playing with it.

msbarry commented 4 months ago

Thanks! Just to clarify that's not for rendering anything, it's just the fixed cost to read overture data. Rendering a complex profile from it could still take hours depending on how much data/post processing you include.

Since spatial queries appear to be so fast I think an interesting possibility this opens up could be to expose a web server that renders tiles on the fly/hot reloads when the profile definition changes. But when you're done you'd still run in batch mode to generate all the tiles.

msbarry commented 4 months ago

The 0.8.0 release adds geoparquet/overture support. See https://github.com/onthegomap/planetiler-examples for an examples and instructions getting started.

neodescis commented 4 months ago

The 0.8.0 release adds geoparquet/overture support. See https://github.com/onthegomap/planetiler-examples for an examples and instructions getting started.

This is great! Are you planning on updating this profile as well? Or is the plan to remove it and leave the profile details as an exercise to the reader?

msbarry commented 4 months ago

That's a good question, my understanding is that overture maps foundation plans to provide x-ray view vector tiles for each release on their end, so the "see what overture data looks like" use case will be covered. A more full-featured overture profile would mostly be so that people could actually build maps based on overture data. I'm not sure if people would want that as a brand new schema closely matching overture raw data, or as an adapter that maps the data to an existing schema like shortbread or openmaptiles - or if that would make sense embedded in planetiler, planetiler-examples, or as a standalone repo.

neodescis commented 4 months ago

I didn't know they were planning on providing tiles. Meanwhile, maybe it would make sense to add the remaining data sets (admin and places) to the examples repo?

vcschapp commented 4 months ago

I didn't know they were planning on providing tiles. Meanwhile, maybe it would make sense to add the remaining data sets (admin and places) to the examples repo?

Just a quick note that the admins theme was replaced with the divisions theme a few months back. I expect the admins will be formally dropped in the July data release.

bdon commented 4 months ago

I didn't know they were planning on providing tiles. Meanwhile, maybe it would make sense to add the remaining data sets (admin and places) to the examples repo?

Here's example profiles for Buildings, Transportation and Base that will be migrated into Overture examples : https://github.com/bdon/overture-tiles/tree/main/scripts/2024-05-16-beta.0

zachtrong commented 1 month ago

Hi, Iā€™m looking for guidance on how to handle both OSM vector tiles and OvertureMaps Parquet files.

Can someone help me with the best approach?

wipfli commented 1 month ago

Which data from Overture would you like to use @zachtrong?

I know that @msbarry has combined OSM data for all layers of OpenMapTiles except for the buildings layer, where he used Overture data instead...

zachtrong commented 1 month ago

Which data from Overture would you like to use @zachtrong?

I know that @msbarry has combined OSM data for all layers of OpenMapTiles except for the buildings layer, where he used Overture data instead...

Yes, I was curious about merging OSM data with Overturemaps Buildings. Seems like inheriting OpenMapTiles profile with a custom Building handler is the approach here.

msbarry commented 1 month ago

Seems like inheriting OpenMapTiles profile with a custom Building handler is the approach here.

Yep that's what I did for onthegomap. We could update the openmaptiles profile to support an --overture-buildings flag or something that tells it to use buildings from overture instead of OSM. What I did for onthegomap isn't too generalizable though since I don't use any building tags.

neodescis commented 1 month ago

There's nothing that says Overture and OSM data have to be in the same tile archive. We're using protomaps to generate a pmtiles archive for OSM data, and then we are generating another archive for Overture buildings and places using a separate (and pretty simple) planetiler profile.

acalcutt commented 1 month ago

I am also using overture buildings from the planetiler example profile at https://github.com/onthegomap/planetiler-examples/blob/main/OvertureBuildings.java

I made them similar to this the Landcover example at https://gist.github.com/dschep/9a4c875715e62c6b8e7d5697e33780d4

mkdir -p data
overturemaps download -f geoparquet --type=building -o data/buildings.parquet
java -cp planetiler.jar OvertureBuildings.java 

It ends up with a tiles like this https://tiles.wifidb.net/data/overture_buildings/#14.1/42.26058/-71.80832

I added this to my style , which is a little hacky with height of the buildings. I found 'height' wasn't consistently on all buildings, so some I used 'numFloors' multipled by 3 if that existed instead.


    {
      "id": "building",
      "type": "fill",
      "source": "overture_buildings",
      "source-layer": "building",
      "paint": {
        "fill-antialias": true,
        "fill-color": "rgba(222, 211, 190, 1)",
        "fill-opacity": {
          "base": 1,
          "stops": [
            [
              13,
              0
            ],
            [
              15,
              1
            ]
          ]
        },
        "fill-outline-color": {
          "stops": [
            [
              15,
              "rgba(212, 177, 146, 0)"
            ],
            [
              16,
              "rgba(212, 177, 146, 0.5)"
            ]
          ]
        }
      }
    },
    {
      "id": "building-3d",
      "type": "fill-extrusion",
      "source": "overture_buildings",
      "source-layer": "building",
      "filter": [
        "all",
        [
          "!has",
          "hide_3d"
        ]
      ],
      "paint": {
        "fill-extrusion-color": [
          "case",
          [
            "has",
            "colour"
          ],
          [
            "get",
            "colour"
          ],
          [
            "interpolate",
            ["linear"],
            ["case",["has","height"],["get","height"],["case",["has","numFloors"],["*", ["get", "numFloors"], 3],3]], 0, "lightgray", 200, "royalblue", 400, "lightblue"
          ]
        ],
        "fill-extrusion-height": [
            "interpolate",
            ["linear"],
            ["zoom"],
            14,
            0,
            15,
            ["case",["has","height"],["get","height"],["case",["has","numFloors"],["*", ["get", "numFloors"], 3],3]]
        ],
        "fill-extrusion-base": ["case",
            [">=", ["get", "zoom"], 16],
            ["get", "render_min_height"], 0
        ]
      }
    },
acalcutt commented 1 month ago

Although I'm not sure what I used is always great, for example, the Eiffel Tower.. https://tiles.wifidb.net/styles/WDB_OSM/#15.6/48.860436/2.295161/-52/58