pysal / spopt

Spatial Optimization
https://pysal.org/spopt/
BSD 3-Clause "New" or "Revised" License
303 stars 46 forks source link

Regionalization with a minimum and maximum thresholds of and attribute value (or maximazing homogenization among regions) #274

Open orlandombaa opened 2 years ago

orlandombaa commented 2 years ago

Hello everyone

I am trying to solve a problem related to regionalization. What I want to do is create regions spatially continuous with a maximum of homogeneity in one of its internal variables (let´s say population) or that I can give a minimum and maximum value per region without specifying the number of regions). Until now I have tried to use the algorithm of MaxPHeuristic where I can give a threshold value, this is a good approach but the homogenization of my internal variable is not very good (using my data I got regions with around 40 % more population than other regions).

Is it possible to give a given number of clusters using the algorithm of MaxPHeuristic and increase the homogeneity of the regions? or I should choose another algorithm?

jGaboardi commented 2 years ago

@knaaptime Do you have some advice for this?

orlandombaa commented 2 years ago

This problem is what in Argis Pro is called Spatially Constrained Multivariate Clustering

https://pro.arcgis.com/en/pro-app/2.8/tool-reference/spatial-statistics/how-spatially-constrained-multivariate-clustering-works.htm

knaaptime commented 2 years ago

with the max-p algorithm, the p parameter (the number of regions/clusters) is endogenous. Instead of setting the number of clusters a-priori, the analyst sets a minimum value for a threshold variable, then the algorithm works to maximize the number of regions, subject to the threshold constraint. Then the algo tries to maximize homogeneity inside the resulting regions, as long as it doesnt reduce p

max-p is greedy with respect to p, which means it will keep increasing the number of regions (even if it has to sacrifice internal homogeneity) as long as the minimum value for the threshold variable is met. So in your case, it will keep creating new regions as long as it meets your population threshold, even if the resulting regions dont have homogenous population levels

If you want to set the number of regions exogenously, you might try a different method like skater or hierarchical clustering with a spatial constraint. The arcgis method you linked to is using skater.

orlandombaa commented 2 years ago

Thank you @knaaptime it really help me your comment.

orlandombaa commented 2 years ago

Hi @knaaptime , I have another question. I have been watching the examples and documentation of the skater algorithm and the hierarchical clustering. I see that in the first case the threshold that I can give to the algorithm is in terms of the number of spatial objects per region. I have been testing this same algorithm but in pygeoda, there you can give the threshold in terms of a variable. Is there any variation of a skater in Pysal where I can give the number of clusters and the threshold related to a variable?

Best regards, Orlando

knaaptime commented 1 year ago

great question. At the moment, our SKATER implementation has a floor argument which can be used to set the minumum number of observations assigned to each cluster, but it looks like we don't have an analog to the threshold_name argument available in max-p that lets you use a variable as a floor condition.

We should add that

cc @xf37

jGaboardi commented 1 year ago

The corresponding keywords could be: