mapbox / mapbox-gl-js

Interactive, thoroughly customizable maps in the browser, powered by vector tiles and WebGL
https://docs.mapbox.com/mapbox-gl-js/
Other
11.18k stars 2.22k forks source link

Support joining external attributes to geometries directly #4261

Closed stevage closed 7 years ago

stevage commented 7 years ago

Currently there are only two ways to join geometry from one tileset with an external dataset:

  1. Merge it offline and upload to Mapbox
  2. Compute all the visualisation properties directly for each row, as per the "Join local JSON data" example.

The first is inefficient, difficult to manage, and removes the possibility of using third-party geometries.

The second is cumbersome, foregoes all of the data vis capabilities of mapbox-gl-js (ie, you have to use "category" with a pre-calculated value, instead of interpolating), and slower.

Given that this is a very common use case (eg, visualising any kind of census data, per city/county/state/whatever), how about supporting external datasets directly?

It could be done fairly elegantly in the style spec:

"states": {
  "type": "vector",
  "url": "...",
  "attributes": {
    "population": {
      "data": [ { "shortname": "VIC", "count": 5000000 }, { "shortname": "NSW", "count": 6000000 } ...],
      "data-join-field": "shortname",
      "geometry-join-field": "id"
    }, ...
  }
}

Then you'd access the population value as .properties.population_shortname, for instance.

anandthakker commented 7 years ago

@stevage thanks for this suggestion! I agree that this would be a really useful feature, and I think the most promising route for implementation is alongside the custom source type project. A key goal of that project is to introduce more flexibility into the Source API to allow for plugin-like extensibility for data sources. I see this as a principal use case for that API.

andrewharvey commented 7 years ago

Previously requested in https://github.com/mapbox/mapbox-gl-js/issues/2671

stevage commented 7 years ago

Ah yep - good discussion there.

@anandthakker Custom source types look promising, it's a bit hard to tell from the various issues what the expected end state is at the moment, but I'll keep an eye on it.

lucaswoj commented 7 years ago

We are hesitant to provide data join functionality in the style spec because

For these reasons I am closing this ticket in favor of the custom source types concept. We do not currently have anybody working on its implementation. The closest thing we have to an interface proposal is https://github.com/mapbox/mapbox-gl-js/issues/3186

stevage commented 7 years ago

there is little performance advantage to joining data within GL JS rather than external to GL JS (during source generation or by building a geojson source at runtime)

During source generation is ok, if the same person manages both the geometries and the attributes. But it's common for geometries to be provided by a government body such as the Australian Bureau of Statistics, which updates them only every couple of years. And the attributes to come from a wide variety of sources. If the person with the attributes never has to touch any spatial format or deal with vector tiles directly, it greatly increases the range of applications of this kind of visualisation.

As for building a GeoJSON at runtime, that's prohibitively expensive in many cases. For instance, Australia's SA4 boundaries (that is, the most coarse-grained boundaries, 100,000 - 500,000 people) is a 98MB GeoJSON file.

Various use cases are simply not possible at present:

So, basically I'm arguing on the basis of flexibility, not performance. :)

users are likely to demand increasingly more powerful data join operations

Hmm, what kind of "increasingly more powerful data join operations" are you envisaging? Joining a table of attributes to a set of polygons is extremely common, but I can't off the top of my head think what the next step of this slippery slope would be.

lucaswoj commented 7 years ago

As for building a GeoJSON at runtime, that's prohibitively expensive in many cases.

So is loading a large dataset of joined values!

but I can't off the top of my head think what the next step of this slippery slope would be.


Having a way to programatically create/manipulate data is going to be a more generally useful and future proof than addressing this one use case with a particular feature.

stevage commented 7 years ago

I want to downcase / strip whitespace from the joined values I want to downcase / strip whitespace from the existing values I want to add/multiply/concat/... the joined value to the existing value

Those all fit within a category of data munging that can be carried out before the geometry join. I don't see any need to ever support those. I'm assuming that:

I only want to load joined values for tiles in the viewport

Sounds like an optimisation that the API might want to make at some point. I don't know enough about the internals of GLJS to comment really.

So...so far no scary steps down the slippery slope :)

andrewharvey commented 7 years ago

If your Tileset has a lot of features (and potentially a large number of attributes), your browser won't have a single object, rather you'd have an API to get these attributes from a list of IDs. So the data join would need to support this somehow, perhaps being synchronous so you can make that AJAX call to get the extra attribute values for the featureIds in a tile? If it's per tile is it up to the client to cache them or does GL JS handle? All things which in my view would need to be considered if the join attributes to a vector source is supported.

mountainMath commented 7 years ago

I am very interested in this scenario. The custom source type project looks like it would perfectly fit my bill. Is that project still ongoing? Are there any example implementations of a custom source type that I could look at and try and adapt for my purposes. Ideally a custom source that consumes geojson tiles, that would make it relatively easy for me to adapt.

anandthakker commented 7 years ago

@mountainMath the project is on the roadmap, although not (yet!) under active development.

anandthakker commented 7 years ago

Roadmap: https://www.mapbox.com/mapbox-gl-js/roadmap/

pestrov commented 5 years ago

Hi everyone!

I've tried hard to find out the current state of an efficient way to join vector tile geometry with external data source by a field. Has anything changed, is planning to change, or match is still the best way we could do that?

Thank you!

asheemmamoowala commented 5 years ago

@pestrov The only other update to this is to use Map#setFeatureState in conjunction with feature-state expressions. This approach requires unique feature ids on every feature in the vector tile source-layer that needs to be joined to.