Closed renanxcortes closed 4 years ago
ps.: if you run locally the notebook, probably you'll not get the same results since just establishing a numpy seed in the cell does not guarantee that all the steps of the algorithm will be the same. But I believe that it is very likely that the bug will be reproduced in a different way.
So, I created a more complete notebook that tries to reproduce this bug for all local search approaches of the max-p and these results are summarized in this comment. This notebook is present in https://github.com/renanxcortes/region/blob/maxp-debug/region/notebooks/maxp-BasicTabu-bugs-exploration-Complete.ipynb (the motivation to create a different notebook was that the first one has some comments between cells related to a specific run of the kernel). Basically, there are two types of bugs: threshold constraints and spatial constraints.
I spotted the presence of these bugs in the following cases:
Basic Tabu (class AZPBasicTabu
): threshold and spatial.
Reactive Tabu (class AZPReactiveTabu
): threshold AND SPATIAL.
UPDATE (will update the previous comment)!
The AZPReactiveTabu
actually can generate islands as a result. I managed:
Therefore, I have a weak sense that one of the issues might be raising in the lines that establish:
sub_adj = sub_adj_matrix(adj, np.where(labels == old_region)[0], wo_nodes=area)
These are common for both BasicTabu
and AZPReactiveTabu
. This tries to assure contiguity between the local search iteration process.
I've been trying to debug, but still not really sure how to do and, therefore, will use these comments to share some thoughts for anyone and also for my future self. Perhaps the issue of contiguity might be present in the AZPTabu
class.
I think I'm getting closer. The issue might in the function assert_feasible
present in https://github.com/pysal/region/blob/3dcf777b6a9d528145013023257ba405608261e7/region/util.py#L715 I think the condition established in https://github.com/pysal/region/blob/3dcf777b6a9d528145013023257ba405608261e7/region/util.py#L742 is not asserting properly the contiguity.
After adding
from region.csgraph_utils import sub_adj_matrix, is_connected
The for loop in the assert_feasible
function needs to be tweak to:
for region_label in set(solution):
aux = sub_adj_matrix(adj, np.where(solution == region_label)[0])
# check right contiguity
if not is_connected(aux):
raise ValueError("Region {} is not spatially "
"contiguous.".format(region_label))
Then, this needs to be added in the local searches that have issues on it. For example, for the AZPBasicTabu
, the while loop of the line https://github.com/pysal/region/blob/3dcf777b6a9d528145013023257ba405608261e7/region/p_regions/azp.py#L792 needs to be changed to:
while True and assert_feasible(labels, adj, n_regions=None):
After changing that the contiguity problem will be solved. However, I'm having a hard time after solving this issue to find a counter-example where the contiguity is respect and threshold is not (in other words, it looks like these changes fix both bugs, which is weird and probably not the case. I think what is happening is that is achieving faster the solution, since an additional constrained was added, and not giving "time" to find a non-respected threshold solution).
Regarding the threshold constraint, it was checked before the local search and the local search was "broken". Now the local search is not broken and, therefore, both constraints are respected. I basically added some print statements to check the order here https://github.com/renanxcortes/region/commit/a46c5f97dc17975942f890e84ef4974876e891c3 and was running working notebooks to debug this.
So, I think both issues were solved with the tweak of the last comment. I will do a PR later for that.
After adding
from region.csgraph_utils import sub_adj_matrix, is_connected
The for loop in the
assert_feasible
function needs to be tweak to:for region_label in set(solution): aux = sub_adj_matrix(adj, np.where(solution == region_label)[0]) # check right contiguity if not is_connected(aux): raise ValueError("Region {} is not spatially " "contiguous.".format(region_label))
Then, this needs to be added in the local searches that have issues on it. For example, for the
AZPBasicTabu
, the while loop of the lineneeds to be changed to:
while True and assert_feasible(labels, adj, n_regions=None):
After changing that the contiguity problem will be solved. However, I'm having a hard time after solving this issue to find a counter-example where the contiguity is respect and threshold is not (in other words, it looks like these changes fix both bugs, which is weird and probably not the case. I think what is happening is that is achieving faster the solution, since an additional constrained was added, and not giving "time" to find a non-respected threshold solution).
The change proposed actually doesn't affect the while loop since it doesn't return a boolean and, therefore, this is not the define solution. Although assert_feasible
originally was, indeed, wrong (because it would never raise the second ValueError
it proposes to check), fixing it still persist the threshold issue (I'm making some tests locally).
The issue might be inside AZP class in the function fit_from_scipy_sparse_matrix
since it can generate non-contiguous regions (added the corrected assert_feasible
function on it and checked that) and also regions that do not respect the threshold.
Fixed in https://github.com/pysal/region/pull/33.
Hello,
So, I found some unusual pattern in the classes generated by the AZPBasicTabu local search class. Please take a look at this notebook (https://github.com/renanxcortes/region/blob/maxp-debug/region/notebooks/maxp-BasicTabu-bugs.ipynb). If you experience a problem seeing this notebook, you can fork/download it and reproduce locally (the lattice shapefile used is in the same repo of the branch).
As far as understood, the clusters should respect some kind of spatial constrained that would be established by the spatial weights matrices. So, in these examples, I used a 10x10 regular lattice for both a Queen matrix and Rook matrix and run the default max-p and also establishing the AZPBasicTabu local search. You can see that in the default the spatial constraints are respected, however, for the AZPBasicTabu it looks like the generating classes have strange spatial allocation.
For the Queen matrix, you can see an 'island' of the Yellow class and for the Rook matrix, you can see some 'Queen-like' pattern of the groups, which I'm assuming shouldn't be desired.
In addition, following the issue described in https://github.com/pysal/region/issues/31 this notebook presents an example where the total generated for each class can be below the threshold established. You can see the values 8322 (Class 3) and 9166 (Class 5) for the Queen matrix case where it is below the established 10000.
Therefore, I think that this AZPBasicTabu local search has two bugs on it. I'll keep exploring other options to check if this happens in other cases.