tangrams / tangram-es

2D and 3D map renderer using OpenGL ES
MIT License
822 stars 239 forks source link

OSM XML Support #877

Open hallahan opened 8 years ago

hallahan commented 8 years ago

Because Mapzen currently supports vector tiles in 3 formats: TopoJSON, GeoJSON and Mapbox vector tiles, it is relatively straightforward to begin adding support for additional data sources. In working on OpenMapKit, I have spent some time thinking about the internals involved with editing and working with OpenStreetMap data. We have been searching long and hard for a new map renderer, and my desire is to do it in a way that OSM data really is a first-class citizen in the mapping library. My thought is that adding OSM XML support is the first step toward creating an OSM editor with Tangram.

I like that Tangram's data pipeline revolves around the notion of a tile. Although JOSM does not think that way, iD actually does. Individual GET requests are made to the OSM Editing API (0.6) for the bbox of a given tile. It turns out we can do the same thing in Tangram ES.

I'm working on a proof-of-concept that takes the approach of directly requesting OSM XML from the OSM Editing API. This XML is then parsed into an OSM data model, similar to what you will see in JOSM and OpenMapKit. We have an in-memory data set for a given tile with access to the OSM elements themselves (nodes, ways, and relations). In addition, we can create a Tangram::Layer from the dataset, allowing tiles to be rendered in a similar manor to vector tiles. We loop through the standalone nodes for points, open ways for lines, and closed ways for polygons.

Right now I'm using rapidxml to parse the XML, mainly because you are using rapidjson for your JSON, and it is a header-only library that is easy to be included. It is a DOM parser only, and I'm wondering if you'd prefer that we switch to something more mainstream--like libxml2 or expat? The current lib is probably fine on a tile-by-tile basis, but if we are trying to populate a DB later on with more data, we might want a streaming parser.

This direct OSM XML support makes sense online, but the real goal for me is to make this work from an offline-data store (SQLite). With that in mind, I'd like to make the MemoryDataSet a child to an abstract DataSet class. That way we can later create a SQLiteDataSet class that queries a database for OSM objects--an alternative to hitting an online REST endpoint. We can break that off into a different issue when it's time.

I've got a branch going that renders OSM XML, though some of the tiles don't come through yet. The colored buildings are using the data: { source: osmApi }. In the scene.yaml, all of the OSM XML data is being treated as a single layer, osmXml, from which tags (properties) are being filtered and styled.

screenshot 2016-07-10 20 01 34

https://github.com/hallahan/tangram-es/tree/OSM_XML

The beginnings of an OSM model:

https://github.com/hallahan/tangram-es/tree/OSM_XML/core/src/osm

We'll probably want to start a new branch that will make it's way into being a pull request. Any input on how to make that happen is appreciated!

cc/ @bcamper @tallytalwar

matteblair commented 8 years ago

This is super cool :)

I'm looking through your changes now, I think we have a few possible directions we could go with this and I'll have more to say on this tomorrow.

nvkelso commented 8 years ago

Wow! Great work :)

On Jul 27, 2016, at 16:58, Matt Blair notifications@github.com wrote:

This is super cool :)

I'm looking through your changes now, I think we have a few possible directions we could go with this and I'll have more to say on this tomorrow.

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

hallahan commented 8 years ago

After a bit of debugging, I've come to find that the "missing tiles" are basically the rapidxml parser failing. Looking at the char* that is passed to the parser, the OSM XML looks totally valid. It chokes saying it is missing an expected <. I've used libxml2 in the past, and it seems to be one of the most widely used. It looks like CGImap uses that lib as well.

I'm new to CMake. How hard would it be to add libxml2 as a dependency?

matteblair commented 8 years ago

Hey again! Sorry for the delay on getting back to this. Thoughts with regards to incorporating an OSM XML data source:

That said, we'd still love to support this in some way! Some options:

  1. We can add the OSM XML code to the tangram-es core library and build it conditionally based on a compile-time flag (e.g. #define TANGRAM_BUILD_OSM_XML).
  2. We can maintain a separate project containing just the DataSource and data model files. Since the tangram-es C++ interface allows adding any data source that conforms to the base class, a user would only need to compile both projects and add an OSM_XML data source at run time.

There are tradeoffs to both of these and I'm sure these aren't the only paths we could take, but either seems plausible to me. Thoughts?

hallahan commented 8 years ago

I agree that it makes sense to make the OSM support be optional. In addition to the OSM editor use case, support for OSM XML will be useful for browsing data that is guaranteed to be the original source. For example, OpenStreetMap.org has a data view where you can see Leaflet rendered vectors of the data on top of the map. OSM elements can be selected, and tags can be seen on a side view. If a future phase of OpenStreetMap.org wanted to adopt Tangram as the renderer, you could use this functionality to provide seamless rendering and introspection on fresh data--something that is currently lacking.

I totally agree parity with tangram-js is a great idea. ES makes the most sense for a mobile editor, but JS makes a lot of sense for casually viewing data without requiring an app. For example, it would be quite useful to have a map style that renders based off of the user attribute of OSM elements. You could then see who has modified what in a mapathon. It could also be useful for Missing Maps leaderboards where the cartography demonstrates data top users have edited. Also, the Overpass API is a great source of OSM XML that can be derived from super intricate queries.

Because Tangram is well suited to gain support for any type of spatial data format, I think this could be a huge technical edge over other modern map renderers out there. Taking a step back, how would we do this if we also wanted to add Shapefile support? GPX, KML, Esri Geodatabase, CSV, etc? How would we do this if we want and individual repo for a specific data format?

Some sort of data source plugin architecture?

matteblair commented 8 years ago

Those are some really good ideas for OSM XML rendering that hadn't even occurred to me - well noted!

A plugin-like architecture does seem like a natural path. The DataSource abstract class is a minimal prototype of what a "plugin" could be. Fully specifying a plugin interface and maintaining separate repos for plugins would be the cost, but the benefits seem pretty great: tangram-es can keep a streamlined set of features for apps that just need efficient rendering and developers are free to implement or modify data source plugins for their specific needs.

I'll look around for examples of this sort of architecture and see what might work for us.

hallahan commented 8 years ago

I made some headway with OSM XML support today. I ended up finding a better XML parser that was easy to include in the project called pugixml. Not only do the benchmarks look good, but the docs are fantastic, and the error reporting is good.

Speaking of which, I've figured out why sometimes the XML doesn't parse...

The char* buffer isn't always the correct. Maybe the EOF delimeter isn't quite in the right place?

For example, here is the contents of task.rawTileData->data() that went into OSM::XmlParser:

https://gist.github.com/hallahan/c6a0a1f14fb7bb900f8232bc462c7a58

The end contents vary. Since I'm devving on my Macbook, seeing that odd <plist... XML suggests it's from the memory of my tangram process on my laptop.

</osm>

        <key>HSTS Host</key>
        <true/>
        <key>Include Subdomains</key>
        <true/>
    </dict>
    <key>za.search.yahoo.com</key>
    <dict>
...

Does anyone have any insight as to why task.rawTileData->data() may be the wrong size?

https://github.com/hallahan/tangram-es/blob/57cab4c6c1f7a195a02370f90332bf6eb6b6584f/core/src/data/osmXmlSource.cpp#L32

hallahan commented 8 years ago

I'm noticing that DownloadTileTask has this public member:

// Raw tile data that will be processed by DataSource.
std::shared_ptr<std::vector<char>> rawTileData;

We're seeing a .data() from the different data sources, and that seems to give a pointer to the underlying vector's array. I wonder... Is the size of this always correct?

http://en.cppreference.com/w/cpp/container/vector/data

Why are we doing this instead of having maybe?

std::shared_ptr<std::string> rawTileData

cc/ @hjanetzek

matteblair commented 8 years ago

Nice work! I may be able to shed some light on this issue with buffer length. The reason we store "raw data" in a vector of bytes instead of a string is that this must also support binary formats like MVT, which can contain null characters in the body and therefore can't be treated as strings. Consequently, rawTileData is not guaranteed to be null-terminated, so if you are using it as a string you should also use the length of the vector to limit the number of bytes read.

hallahan commented 8 years ago

Ah, I see, makes sense. Thanks!

Explicitly making a string with the size of the vector included fixes the problem.

xmlParser.parse(std::string(rawTileData->data(), rawTileData->size()));

Though, I bet I can do this without having to construct a string using a different load function in pugixml...

https://github.com/hallahan/tangram-es/commit/38630e9deb1925696eb0748eb2e58aa247d8032e