pysal / region

A library for Spatially-Explicit Regionalization
BSD 3-Clause "New" or "Revised" License
14 stars 16 forks source link

maxp heuristic BasicTabu also doesn't respect spatial constraints? #32

Closed renanxcortes closed 4 years ago

renanxcortes commented 4 years ago

Hello,

So, I found some unusual pattern in the classes generated by the AZPBasicTabu local search class. Please take a look at this notebook (https://github.com/renanxcortes/region/blob/maxp-debug/region/notebooks/maxp-BasicTabu-bugs.ipynb). If you experience a problem seeing this notebook, you can fork/download it and reproduce locally (the lattice shapefile used is in the same repo of the branch).

As far as understood, the clusters should respect some kind of spatial constrained that would be established by the spatial weights matrices. So, in these examples, I used a 10x10 regular lattice for both a Queen matrix and Rook matrix and run the default max-p and also establishing the AZPBasicTabu local search. You can see that in the default the spatial constraints are respected, however, for the AZPBasicTabu it looks like the generating classes have strange spatial allocation.

For the Queen matrix, you can see an 'island' of the Yellow class and for the Rook matrix, you can see some 'Queen-like' pattern of the groups, which I'm assuming shouldn't be desired.

In addition, following the issue described in https://github.com/pysal/region/issues/31 this notebook presents an example where the total generated for each class can be below the threshold established. You can see the values 8322 (Class 3) and 9166 (Class 5) for the Queen matrix case where it is below the established 10000.

Therefore, I think that this AZPBasicTabu local search has two bugs on it. I'll keep exploring other options to check if this happens in other cases.

renanxcortes commented 4 years ago

ps.: if you run locally the notebook, probably you'll not get the same results since just establishing a numpy seed in the cell does not guarantee that all the steps of the algorithm will be the same. But I believe that it is very likely that the bug will be reproduced in a different way.

renanxcortes commented 4 years ago

So, I created a more complete notebook that tries to reproduce this bug for all local search approaches of the max-p and these results are summarized in this comment. This notebook is present in https://github.com/renanxcortes/region/blob/maxp-debug/region/notebooks/maxp-BasicTabu-bugs-exploration-Complete.ipynb (the motivation to create a different notebook was that the first one has some comments between cells related to a specific run of the kernel). Basically, there are two types of bugs: threshold constraints and spatial constraints.

I spotted the presence of these bugs in the following cases:

Basic Tabu (class AZPBasicTabu): threshold and spatial.

Reactive Tabu (class AZPReactiveTabu): threshold AND SPATIAL.

renanxcortes commented 4 years ago

UPDATE (will update the previous comment)!

The AZPReactiveTabu actually can generate islands as a result. I managed:

image

renanxcortes commented 4 years ago

Therefore, I have a weak sense that one of the issues might be raising in the lines that establish:

sub_adj = sub_adj_matrix(adj, np.where(labels == old_region)[0], wo_nodes=area)

These are common for both BasicTabu and AZPReactiveTabu. This tries to assure contiguity between the local search iteration process.

renanxcortes commented 4 years ago

I've been trying to debug, but still not really sure how to do and, therefore, will use these comments to share some thoughts for anyone and also for my future self. Perhaps the issue of contiguity might be present in the AZPTabu class.

renanxcortes commented 4 years ago

I think I'm getting closer. The issue might in the function assert_feasible present in https://github.com/pysal/region/blob/3dcf777b6a9d528145013023257ba405608261e7/region/util.py#L715 I think the condition established in https://github.com/pysal/region/blob/3dcf777b6a9d528145013023257ba405608261e7/region/util.py#L742 is not asserting properly the contiguity.

renanxcortes commented 4 years ago

After adding

from region.csgraph_utils import sub_adj_matrix, is_connected

The for loop in the assert_feasible function needs to be tweak to:

    for region_label in set(solution):
        aux = sub_adj_matrix(adj, np.where(solution == region_label)[0])

        # check right contiguity
        if not is_connected(aux):
            raise ValueError("Region {} is not spatially "
                             "contiguous.".format(region_label))

Then, this needs to be added in the local searches that have issues on it. For example, for the AZPBasicTabu, the while loop of the line https://github.com/pysal/region/blob/3dcf777b6a9d528145013023257ba405608261e7/region/p_regions/azp.py#L792 needs to be changed to:

while True and assert_feasible(labels, adj, n_regions=None):

After changing that the contiguity problem will be solved. However, I'm having a hard time after solving this issue to find a counter-example where the contiguity is respect and threshold is not (in other words, it looks like these changes fix both bugs, which is weird and probably not the case. I think what is happening is that is achieving faster the solution, since an additional constrained was added, and not giving "time" to find a non-respected threshold solution).

renanxcortes commented 4 years ago

Regarding the threshold constraint, it was checked before the local search and the local search was "broken". Now the local search is not broken and, therefore, both constraints are respected. I basically added some print statements to check the order here https://github.com/renanxcortes/region/commit/a46c5f97dc17975942f890e84ef4974876e891c3 and was running working notebooks to debug this.

So, I think both issues were solved with the tweak of the last comment. I will do a PR later for that.

renanxcortes commented 4 years ago

After adding

from region.csgraph_utils import sub_adj_matrix, is_connected

The for loop in the assert_feasible function needs to be tweak to:

    for region_label in set(solution):
        aux = sub_adj_matrix(adj, np.where(solution == region_label)[0])

        # check right contiguity
        if not is_connected(aux):
            raise ValueError("Region {} is not spatially "
                             "contiguous.".format(region_label))

Then, this needs to be added in the local searches that have issues on it. For example, for the AZPBasicTabu, the while loop of the line

https://github.com/pysal/region/blob/3dcf777b6a9d528145013023257ba405608261e7/region/p_regions/azp.py#L792

needs to be changed to:

while True and assert_feasible(labels, adj, n_regions=None):

After changing that the contiguity problem will be solved. However, I'm having a hard time after solving this issue to find a counter-example where the contiguity is respect and threshold is not (in other words, it looks like these changes fix both bugs, which is weird and probably not the case. I think what is happening is that is achieving faster the solution, since an additional constrained was added, and not giving "time" to find a non-respected threshold solution).

The change proposed actually doesn't affect the while loop since it doesn't return a boolean and, therefore, this is not the define solution. Although assert_feasible originally was, indeed, wrong (because it would never raise the second ValueError it proposes to check), fixing it still persist the threshold issue (I'm making some tests locally).

The issue might be inside AZP class in the function fit_from_scipy_sparse_matrix since it can generate non-contiguous regions (added the corrected assert_feasible function on it and checked that) and also regions that do not respect the threshold.

renanxcortes commented 4 years ago

Fixed in https://github.com/pysal/region/pull/33.