Closed vibze closed 5 years ago
Interesting feature. Can you describe your use case in detail? How should it aggregate the properties? E.g. suppose we cluster a million points where each has different property keys and values — how should the top cluster properties look like then?
User must provide an aggregation function, so in a case of points belonging to a cluster have different property keys and values or value types everything should be handled in that function. Mapbox-gl just applies it to the dataset.
Let's say I want to build a population heatmap using population data for every human settlement in area.
Feature example:
{
"type": "Feature",
"properties": {
"osm_id": "26544289",
"name": "Алматы",
"type": "city",
"population": 1475400
},
"geometry": {
"type": "Point",
"coordinates": [ 76.9458522, 43.2355947 ]
}
}
Now, I want every cluster to have a sum population of all included points as a property. I configure the source like this:
map.addSource("points", {
"type": "geojson",
"data": "data/points.geojson",
"cluster": true,
"clusterMaxZoom": 14,
"clusterRadius": 20,
"clusterAggregations": {
"population_sum": function(prevValue, feature, i, arr) {
if (!('population' in feature) || isNaN(parseFloat(feature.population))) {
return prevValue; // handle the case of missing keys or wrong type
}
return prevValue + parseFloat(feature.population);
}
}
});
... and every cluster from this source receives the population_sum
property to work with.
I believe we won't be able to accept a function, since the mapbox style specification only allows valid JSON. Would it be enough to have something like this?
clusterAggregate: ["population"]
Then it would just do a sum for all given properties.
It would be enough for certain cases, but definitely not as flexible. Do you think something like this http://bl.ocks.org/gisminister/10001728 could be possible with Mapbox GL API?
Can also have some common mathematical functions
clusterAggregateFunction: "SUM"
or clusterAggregateFunction: "AVG"
?
That would be helpful.
+1 for this feature request. Specifically: 1) Ability to specify one or multiple property columns to aggregate 2) Ability to select the aggregation caclulation type.
Requested aggregate calculations for circle-cluster property data to support would be min, max, average, and sum.
As an example, here is the map visual I want to create. To achieve this today, I aggregate the geojson data on the server to return one point with the aggregated "property" value I want.
I'd like to completely port my heatmap calculations over to the new circle-cluster and circle data-driven style. The ask would be to be able to return the raw geojson point data from my data set, and cluster it using the new circle-cluster / geojson property aggregation feature.
One challenge may be enforcing data types - geojson property data could be strings, floats, ints, etc. Some aggregate calcs may apply to certain data types but not others.
Hi, is there any news on this feature? It would be awesome to have it on GL.
I'd love this functionality too. I'm attempting to display several million points on a map, retrieved by an Elasticsearch geohash aggregate query (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geohashgrid-aggregation.html). This returns data where each row already includes a sum of the points within a certain area.
As it stands the mapbox gl clustering code would only sum the number of results returned not the already-summed values within each result, which totals several million.
I note this is already an issue on supercluster: https://github.com/mapbox/supercluster/pull/12
I've been playing around with this and have something working with supercluster to pass in an arbitrary function that will let you aggregate properties.
Then I was thinking for mapbox you could pass in something like clusterAggregates: { population: 'sum', minimum_wage: 'min' }
, then mapbox would pass a function to supercluster based on that. In this case the cluster would get a population
property with the sum of all the population
properties, and a minimum_wage
property with the minimum of all minimum_wage
properties in the cluster.
It would be simple enough to have sum
, max
, and min
functions to start, which could expand from there.
I have a feeling there will be a lot of red tape around this though - I pretty much have it coded already but it requires changes to supercluster, mapbox-gl, and mapbox-gl-style-spec.
pass in an arbitrary function that will let you aggregate properties
As stated above we cannot accept arbitrary functions for technical reasons.
I have a feeling there will be a lot of red tape around this though - I pretty much have it coded already but it requires changes to supercluster, mapbox-gl, and mapbox-gl-style-spec.
If you're serious about working on this project (:heart:), please open a ticket in the mapbox/mapbox-gl
repo and we'll walk you through the changes step-by-step! It's really quite straightforward.
I had a stab at this too and am using this fork in production: https://github.com/davecranwell/mapbox-gl-js
It uses a PR made by a kind person on supercluster (https://github.com/mapbox/supercluster/pull/12) and my own fork of mapbox-gl-style-spec (https://github.com/davecranwell/mapbox-gl-style-spec) which adds support for the field added in the PR.
Was unsure whether to submit this PRs since the one on supercluster has been lurking for a while now.
As stated above we cannot accept arbitrary functions for technical reasons.
@lucaswoj My understanding is that we can't pass a function into the mapbox styles because they only take JSON - I'd imagine part of that reasoning is because data passed to a worker needs to be serializable. However, supercluster could take any function and mapbox-gl could be the one to pass in the function (from the worker) based on the JSON you give it. Please correct me if I'm understanding something wrong.
I'd be happy to work on the project - I'll get everything together soon and open a ticket.
@davecranwell The mapbox/supercluster#12 pr is really similar to what I have, but the problem I have with it is you can only work with one property (metric). I think it would be handy to be able to aggregate multiple properties, so for example you can get the sum of one property, and the max of another, for example. I guess in theory you could have a property that is a hash and use that as the metric, but then with just adding clusterMetric
you're stuck with the default reducer, which is a summing function.
However, supercluster could take any function and mapbox-gl could be the one to pass in the function (from the worker) based on the JSON you give it.
Can you provide a concrete example (pretend Javascript) of how this would work?
I think it would be handy to be able to aggregate multiple properties, so for example you can get the sum of one property, and the max of another, for example.
Agreed. Whatever solution we end up merging will most likely work the way you describe.
Sure,
I still need to write tests and make changes to the spec. Also I figure it will need to be tested and code reviewed for style, browser support, error checking, safety/security, speed, etc. Regardless here's what I have so far:
supercluster changes mapbox-gl-js changes
Then in your mapbox style you could provide an option like this:
clusterAggregates: {
property1: 'sum',
property2: 'min',
property3: 'min',
property4: 'max'
}
For now there's no error checking, so if you don't have those properties you'll just get some concatted strings or some NaN's or something else weird probably. It just assumes those properties will be on each point, and carries the accumulated values over to the clusters. So your clusters will end up with the same properties, but with aggregated values. You could still filter the clusters and style them differently if you want by using point_count
like normal.
Any news on this topic?
I'd like to have a behavior for MGL similar to https://github.com/SINTEF-9012/PruneCluster which works great with Leaflet.
@lamuertepeluda per https://www.mapbox.com/mapbox-gl-js/roadmap/ this feature is under active development right now 😄
@lucaswoj that's great news. I started contributing to the clustering engine, "supercluster". See https://github.com/redbmk/supercluster/pull/2 and https://github.com/mapbox/supercluster/pull/12#issuecomment-275718245 I hope they get merged soon.
I'll also try to fork mapbox-gl and add the accumulator parameter to the geojson source. I think it should have good enough performances.
Hey, I've been pretty busy but I'll look into that pull request hopefully tonight. I'd actually been working off the cluster-agg-hash-array branch after some discussion about the style spec in mapbox/DEPRECATED-mapbox-gl#26. I'll merge that into master (in redbmk/supercluster) as well.
@lucaswoj are we still depending on mapbox/mapbox-gl-style-spec#47 (it looks like the discussion for this has moved around a bit, but I think that's the latest). I don't see it on the roadmap.
Alright I updated the master and cluster-agg-hash-array branches of redbmk/supercluster.
We still need to come to some consensus on the syntax and I feel like the style spec needs to be fleshed out a bit. Right now it just accepts any object and has a comment suggesting how to use it, but it seems like there should be a way to specify exactly what options are allowed.
I think the aggregator function should just be a parameter of the geojson source (worker options), see the workerOptions configuration rather than the style object.
Once you do this, you should get the custom properties defined in the aggregator function together with _{pointcount} for each cluster like in the cluster example.
I tried upgrading supercluster to the current version, and this makes _{pointcount} and _{clusterid} available for each cluster. So I think it would be enough to add an aggregator function parameter to the addSource that get passed down to the supercluster constructor as modified by @redbmk and me to get the custom aggregations for each point.
This wouldn't require any changes to the current style specs but just to add some other layers, that like the "cluster-count" layer of the example, only represent the cluster population information. It would be easy to get a representation like in the left picture of PruneCluster categories example, while to get the representation on the right, there should be some additional border specs parameter.
I already managed to do this but I was missing the information about the cluster population, that's why I switched to the mod of supercluster.
That's how I have it coded here. The code is over a month behind from master now, but it's the same idea you mentioned, going through workerOptions.
Ideally we would be able to add an aggregator function via the style spec or via the addSource function. That way you could add it to the original style and/or via addSource.
I believe it's technically possible to add arbitrary functions to a web worker by creating a Blob. If that's true, maybe we could pass in a function as a string that would be parsed in the worker. Maybe the string could be either a url to a function (which could be a blob url or an external script), or a string representation of a function (e.g. "" + function () { return 'hello world' }
) which would be converted to a blob before being passed into the worker. That might also help solve mapbox/mapbox-gl-style-spec#47 by just letting the user pass in any function as a string.
Indeed. I would do that either by using Blob or some library such as http://www.eslinstructor.net/vkthread/ that seems to allow passing functions defined in the main script to the workers
Alright @redbmk I updated the demo of your supercluster fork by adding the Blob approach you mentioned here.
I had a look to the mapbox-gl-js code and I have to figure out how this could blend into the more complex worker instantiation mechanism, but I definitely think it is technically possible to do so.
That might also help solve mapbox/mapbox-gl-style-spec#47 by just letting the user pass in any function as a string.
This is a non-starter for us because we need to have parity with GL Native which does not necessarily have a Javascript interpreter available.
Oh, whoops yeah that's a major oversight on my part!
If we end up going with @mourner's changes in mapbox/supercluster#36 then the style spec could have properties like clusterInit
, clusterMap
and clusterReduce
that get parsed and passed in. Maybe you could do something like clusterInit: { min: Infinity, sum: 0 }
. The map and reduce functions could then use the same syntax we land on for mapbox/mapbox-gl-style-spec#47, which I'd imagine should be able to do simple math and have access to property values.
@mourner the supercluster
level changes have shipped right? All we need to do now is adapt the gl-js api to take advantage of the new functionality? Are you up for taking this on or would you like someone else/me to take this?
@mollymerp for property aggregations, we would need arbitrary expressions support in the style spec, which is quite a big task. I know @kkaefer approached it with some experiments on the native side at some point. It would be great to start looking at this again.
cool – ticket tracking that feature request is at #4077
It is disheartening that this has been stalled so long after the supercluster changes. Even without arbex and all of the style spec changes and the like it would be great to at least have the production release of mapbox-gl-js built with the current release of supercluster 2.3.0 instead of 2.2.0. This would expose cluster_id in the feature properties so that an independently initialized supercluster object can be cross referenced by cluster_id. Right now correlating the two requires a query by bounding box which is neither reliable nor efficient.
map.addSource("data", { type: "geojson", data: "data.geojson", cluster: true });
// We can do whatever we want to our own supercluster, feature parity be damned
var cluster = supercluster({});
cluster.load(map.getSource('data').serialize().data.features);
...
// Get a cluster feature on click or whatever
var features = map.queryRenderedFeatures(e.point, { layers: ["data"] });
// Requires mabbox-gl-js built with supercluster ^2.3.0
var cluster_id = features[0].properties.cluster_id;
// Get every feature within the cluster
var all_features = cluster.getLeaves(cluster_id, Math.floor(map.getZoom()), limit = Infinity);
@johnlaur Want to submit a PR that updates the supercluster version?
Here is an interim solution to property aggregation using Supercluster in GL JS.
https://bl.ocks.org/ryanbaumann/01b2c7fc0ddb7b27f6a72217bd1461ad
Any updates on this?
@benderlidze there is a lot of activity happening at #4777 to allow for arbitrary expressions. Once that is complete, it will be possible to use those expressions to add support for property aggregation.
In addition to that v0.39, which was released a few weeks ago, ships with an updated version of supercluster that supports property clustering.
@kkaefer there's no way to take advantage of that supercluster feature (without forking GL JS) until we expose it through the style spec.
Any news on this feature? Alternative workaround?
Any e.t.a for this feature? It's a long time since this issue (and other similar ones) has been opened, and the problem has been already resolved on the superclusterjs side...
I attempted a new workaround approach (which is a bit complex and I'd like to publish as an example on dedicated github repo as soon as I have time) but mainly works like this:
URL.createObjectURL
)getClusters
message with bbox
and zoom
, it invokes the index.getClusters
returns the new blob URL of the clusterized datacluster: false
) which receives the blob URL as data
option, and on each moveend
(debounced) it posts a getClusters
message with bbox
and zoom
getClusters
message from the worker, it invokes source.setData('clusteredFeaturesUrl')
Each cluster gets the custom stats defined in the worker. It seems to be fluid enough and waste less memory even with ~300 point features.
Of course it is suboptimal and I want to verify some other details, but it basically works and it's maybe better than having a copy of supercluster in the main thread, which ~sucks~ I don't like at all.
@redbmk
@benderlidze there is a lot of activity happening at #4777 to allow for arbitrary expressions. Once that is complete, it will be possible to use those expressions to add support for property aggregation.
Is there an update on this? We would also like to add this feature in our project.
This would be a great feature for something we're working on at Open Listings
Previously we've just been displaying homes on our map, which has been working fine with clustering. But we're currently working on coalescing homes that are all in the same building into a singular geojson feature, so there'll be less markers on our maps in densely populated areas.
With the way clusters are working now, the point_count
won't be correct if building's are clustered in –– it'll just count as 1
, rather than the total number of homes in the building.
I think if we store the building home count in our geojson metadata, then there should be a way to render the cluster value we want if a feature like this gets added!
I've been really busy but I would love to work on this again now that Arbex is ready. Maybe I can make some time this weekend to take a look.
This is something I have been patiently waiting to be implemented so its good to see this topic looked at again.
Im working on a project that needs this feature pretty badly. It is intended to monitor stores that are online/offline/mobile backup. Clusters would need to follow 3 rules:
1) If all stores are online in a cluster it will be green 2) If one store is offline in a cluster it will be red 3) If one store is on mobile backup it will be orange
Right now it looks like this, unclustered points are working as expected:
@senoj you can do that right now with the external supercluster library, see ryanbaumann's post above. I successfully used that technique to render pie charts showing the ratios of different properties in the clusters. It'll definitely be simpler when that's part of the mapbox-api, but if your business needs are urgent, you can do it now :+1:
@foundryspatial-duncan its not urgent, if there is no traction on this over the next month ill look at this. Id hate to put the work in and shortly after have it added to the mapbox-api :)
@senoj @foundryspatial-duncan I also started to implement it with supercluster. This is working quite well. But I would also prefer, when it's integrated in the mapbox-api.
@kkaefer @qwervo @lamuertepeluda @JoJ123 @tvler @senoj
I created a PR for this (#7004). I'm sure the syntax will change slightly after it's reviewed by the maintainers, but if you want to test it out now, that would help find any bugs or issues that may arise. Any feedback is welcome. Thanks!
Also, if anybody is interested in helping out, I still need to do some benchmarking, make sure the documentation is up to date everywhere, etc. Any help would be much appreciated!
Picking this up again, here's a PR proposing a different syntax/implementation for this feature https://github.com/mapbox/mapbox-gl-js/pull/7584 — check it out!
Last call for comments on #7584! It's ready for review and likely to be merged soon.
Hello, I've started exploring this lib and have a feature request: Provide reduce function(s) to clustering configuration, so it's possible to filter clusters by aggregated properties rather than just
point_count
.