holoviz / geoviews

Simple, concise geographical visualization in Python
http://geoviews.org
BSD 3-Clause "New" or "Revised" License
577 stars 75 forks source link

Easily resampling choropleths #66

Open jbednar opened 7 years ago

jbednar commented 7 years ago

When GeoViews is used with datashader, it can make sense to convert a choropleth into an approximation of the underlying population-based data, as was done for the census dataset here:

image

In that example, the census data was only available at the block level, but by randomly choosing a location for each datapoint from the population count per block, the plot becomes very concrete, conveniently conveying both population counts across the surface and category information in a way that is easily interpretable when zooming in.

So, it would be nice if there were a convenient way to spatially resample shapefile-based data like:

image

to create a synthetic population plot like the above. For this example, there would need to be a way to choose between two categories (vote for and against) or three categories (for, against, did not vote).

PeterDSteinberg commented 7 years ago

This operation is important in viz and would be helpful as a more general utility in labeling of features for ML training. One idea is a hierarchical spatial index, like geohashes at multiple scales that map to identifiers of geo-region polylines/polgons. I think that idea is an R-Tree index. Here's a Python R-Tree package I'll look over soon.

jbednar commented 7 years ago

It would be helpful if such an operation also supported a spatial mask, such that each shape in the choropleth can be overlaid with a mask of smaller regions (parks, lakes, etc. in the Census case) from which random samples are rejected. The Cooper Center Racial Dot Map shows sample code for doing such resampling, and also some demonstrations of the difference between masked and unmasked results.

jbednar commented 7 years ago

A hierarchical spatial index is something we are planning for Datashader, but I'm not sure how it would help with a resampling or synthesizing operation like this?