mapbox / tippecanoe

Build vector tilesets from large collections of GeoJSON features.
BSD 2-Clause "Simplified" License
2.64k stars 426 forks source link

Is there a way to obtain a parseable "log" of feature.id's that were clustered? #740

Open josiekre opened 5 years ago

josiekre commented 5 years ago

We are using --cluster-distance=10 in combination with -r1 to cluster millions of points per zoom level. The points come from our transportation microsimulation (https://citycast.io), and we are preparing them through an automated post-processing step for our UI data viz.

When clustering, is there a parseable log or output of a crosswalk between pre-clustered feature.id and post-clustered feature.id?

We join properties onto the clustered points using a client-side data join (see example) so that we can style based on user interaction.

The maps in our UI have millions of features and hundreds of potential combinations of user interactions that result in different properties to style by, so pre-building the geometries into tilesets and joining properties in the client is an important combination.

(cc: @d11n)

e-n-f commented 5 years ago

There isn't currently a way to accumulate the feature IDs of the clustered features, but if you have the ID in an attribute, you could use

--accumulate-attribute=GEOID:comma

(or whatever your ID field is called instead of GEOID) to accumulate the ID attributes into a comma-separated list in the feature.

josiekre commented 5 years ago

I think that will work perfectly-- smart. We'll try that. Thanks

josiekre commented 5 years ago

With clustering, there's the potential for feature['id'] to repeat in a zoom level with a different point_count, even using --buffer=0. Given that I'm trying to join the point_count on in the client, feature['id'] needs to be unique.

If I use --no-duplication, will MapboxGL handle this okay? Asking another way, what is meant in the readme on this option by "Clients of the tileset must check adjacent tiles (possibly some distance away) to ensure they have all features"?

josiekre commented 5 years ago

It looks like --no-duplication does not actually solve the problem of feature.id uniqueness.

There are 681,732 point features in points_o.geojson.gz. I'm building a tileset from these points with:

tippecanoe \
    --output=points_o.mbtiles \
    --minimum-zoom=10 \
    --accumulate-attribute=point_id:comma \
    --no-tile-size-limit \
    --drop-rate=1 \
    --cluster-distance=10 \
    --buffer=0 \
    --no-duplication \
    --named-layer=points:points_o.geojson.gz

I then build a crosswalk of the decoded tileset by unpacking the accumulated ID attribute:

    zoom     x     y cluster_id point_id
   <int> <int> <int>      <int>    <int>
 1    10   279   415    9017353  9017353
 2    10   279   415    9017353  9018162
 3    10   279   415    9017353  9017386
 4    10   279   415    9017353  9015457
 5    10   279   415    9017353  9014519
 6    10   279   415    9017353  9015972
 7    10   279   415    9017353  9017743
 8    10   279   415    9017353  9017810
 9    10   279   415    9017353  9016886
10    10   279   415    9017353  9017025
# ... with 3,408,683 more rows

The resulting tileset has n points per zoom level.

   zoom      n
  <int>  <int>
1    10 681733
2    11 681732
3    12 681740
4    13 681743
5    14 681745

I think with --no-duplication and --buffer=0, each zoom level should have the original 681,732 point features, but there are more. Some of the points are repeating at the tile boundaries still. Am I missing a setting to make this work?

e-n-f commented 5 years ago

Good catch… the --no-duplication check intends to include each feature exactly once, but the math must be off somewhere. I'll try to figure out what is going wrong.

e-n-f commented 5 years ago

I am having trouble reproducing this. Is there a sample of your data available so I can see exactly what is happening for you?