Open orlandombaa opened 2 years ago
@knaaptime Do you have some advice for this?
This problem is what in Argis Pro is called Spatially Constrained Multivariate Clustering
with the max-p algorithm, the p
parameter (the number of regions/clusters) is endogenous. Instead of setting the number of clusters a-priori, the analyst sets a minimum value for a threshold variable, then the algorithm works to maximize the number of regions, subject to the threshold constraint. Then the algo tries to maximize homogeneity inside the resulting regions, as long as it doesnt reduce p
max-p is greedy with respect to p
, which means it will keep increasing the number of regions (even if it has to sacrifice internal homogeneity) as long as the minimum value for the threshold variable is met. So in your case, it will keep creating new regions as long as it meets your population threshold, even if the resulting regions dont have homogenous population levels
If you want to set the number of regions exogenously, you might try a different method like skater or hierarchical clustering with a spatial constraint. The arcgis method you linked to is using skater.
Thank you @knaaptime it really help me your comment.
Hi @knaaptime , I have another question. I have been watching the examples and documentation of the skater algorithm and the hierarchical clustering. I see that in the first case the threshold that I can give to the algorithm is in terms of the number of spatial objects per region. I have been testing this same algorithm but in pygeoda, there you can give the threshold in terms of a variable. Is there any variation of a skater in Pysal where I can give the number of clusters and the threshold related to a variable?
Best regards, Orlando
great question. At the moment, our SKATER implementation has a floor
argument which can be used to set the minumum number of observations assigned to each cluster, but it looks like we don't have an analog to the threshold_name
argument available in max-p that lets you use a variable as a floor condition.
We should add that
cc @xf37
The corresponding keywords could be:
floor
-> ceiling
quorum
-> threshold
Hello everyone
I am trying to solve a problem related to regionalization. What I want to do is create regions spatially continuous with a maximum of homogeneity in one of its internal variables (let´s say population) or that I can give a minimum and maximum value per region without specifying the number of regions). Until now I have tried to use the algorithm of MaxPHeuristic where I can give a threshold value, this is a good approach but the homogenization of my internal variable is not very good (using my data I got regions with around 40 % more population than other regions).
Is it possible to give a given number of clusters using the algorithm of MaxPHeuristic and increase the homogeneity of the regions? or I should choose another algorithm?