Open jbednar opened 7 years ago
This operation is important in viz and would be helpful as a more general utility in labeling of features for ML training. One idea is a hierarchical spatial index, like geohashes at multiple scales that map to identifiers of geo-region polylines/polgons. I think that idea is an R-Tree index. Here's a Python R-Tree package I'll look over soon.
It would be helpful if such an operation also supported a spatial mask, such that each shape in the choropleth can be overlaid with a mask of smaller regions (parks, lakes, etc. in the Census case) from which random samples are rejected. The Cooper Center Racial Dot Map shows sample code for doing such resampling, and also some demonstrations of the difference between masked and unmasked results.
A hierarchical spatial index is something we are planning for Datashader, but I'm not sure how it would help with a resampling or synthesizing operation like this?
When GeoViews is used with datashader, it can make sense to convert a choropleth into an approximation of the underlying population-based data, as was done for the census dataset here:
In that example, the census data was only available at the block level, but by randomly choosing a location for each datapoint from the population count per block, the plot becomes very concrete, conveniently conveying both population counts across the surface and category information in a way that is easily interpretable when zooming in.
So, it would be nice if there were a convenient way to spatially resample shapefile-based data like:
to create a synthetic population plot like the above. For this example, there would need to be a way to choose between two categories (vote for and against) or three categories (for, against, did not vote).