mggg / GerryChain

Use MCMC to analyze districting plans and gerrymanders
https://mggg.github.io/GerryChain/
Other
132 stars 74 forks source link

Region-aware ReCom #362

Closed gabeschoenbach closed 7 months ago

gabeschoenbach commented 3 years ago

This PR was largely based on the ideas and work of @amybecker and @drdeford, on how to update the ReCom proposal function to be aware of county boundaries. @phorva01 and I extended these ideas to allow ReCom to be aware of multiple regions at once — say, counties and municipalities.

At the highest level, we add an optional parameter called region_weights to the recom() function in tree.py. If included, it should be a list of tuples, like [("COUNTYFP20", 1), ("TOWN", 1)]. This would tell ReCom to try to create districts whose boundaries fall along county and town boundary lines, to the extent possible while constrained by population balance concerns.

This works in two ways:

  1. In tree.py's random_spanning_tree(), we randomly assign edge weights to our graph in order to draw a minimum spanning tree and look for an edge to cut the graph into two population-balanced parts. If region_weights is not None, we will add extra weight to edges that cross our regions, therefore encouraging our spanning tree to be drawn mostly inside our regions, with as few edges crossing region boundaries as possible. Regardless of where the balance edge is, this will help keep regions whole. See below for an example of (a) a minimum spanning tree drawn randomly, and (b) a minimum spanning tree drawn with counties in mind mst_edges_with_counties aware_edges_with_counties
  2. In tree.py's find_balanced_edge_cuts_memoization(), we search for an edge in the minimum spanning tree to cut the graph into two equal-sized parts. Ideally, this edge is one of the few in the tree that does cross from one region to another, since this would neatly split the districts along a region boundary. We first look to see if there exists such an edge, but if not we simply look for any balance edge. If we care about multiple regions, we check to see if there is an edge that crosses as many of our regions as possible.

To sum up, the optional region_weights parameter in the recom() proposal function allows the user to specify an arbitrary number of regions for GerryChain to attempt to keep intact. The weights corresponding to each region will be used in the minimum spanning tree step. If one is more interested in keeping counties intact than towns, they could make the weight for counties larger than that for towns. This would mean ReCom would first look for balanced edges that span different counties in step 2. However, our testing has shown that setting weights of 1 for every region does quite well in optimizing intactness for all region-types simultaneously.

To test out this code, try working through the Jupyter Notebook here. For slides on this development, click here.

InnovativeInventor commented 2 years ago

@gabeschoenbach @phorva01 I just fixed the merge conflict. @pjrule @apizzimenti do you think your concerns have been addressed? What's the status on this PR?

gabeschoenbach commented 2 years ago

I updated the region_weights to a dictionary now, which I agree is a much friendlier input type! The last thing I'm struggling with his how to correctly prioritize balance edges in the cutting step. I feel good about our power-of-two idea whenever there are no tied penalty weights, but as noted above it doesn't work in cases where (say) we have two regions each with penalties of 1. What our code does is first look for balance edges that cut across both region types, but then it prioritizes balance edges that only cut across the first listed region type. Ideally we'd want it to look for balance edges that cut across either region type and choose uniformly from that set. But I'm not sure the nicest way to get that sort of functionality! Any thoughts?

codecov-commenter commented 2 years ago

Codecov Report

Merging #362 (5b8c9b8) into main (18cde80) will decrease coverage by 5.23%. The diff coverage is 24.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #362      +/-   ##
==========================================
- Coverage   87.83%   82.60%   -5.24%     
==========================================
  Files          37       37              
  Lines        1735     1851     +116     
==========================================
+ Hits         1524     1529       +5     
- Misses        211      322     +111     
Impacted Files Coverage Δ
gerrychain/tree.py 54.22% <19.09%> (-21.66%) :arrow_down:
gerrychain/proposals/tree_proposals.py 26.22% <85.71%> (-1.05%) :arrow_down:
gerrychain/proposals/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 18cde80...5b8c9b8. Read the comment docs.

gabeschoenbach commented 2 years ago

@jenni-niels I just added a commit that keeps track of the edge penalties and only adds it to the cuts list (from which one balance edge will be selected at random) if it has the highest penalty of all the balance edges in the tree. I think this should work the same way as you were describing your .NET implementation works, but please take a look and let me know!

peterrrock2 commented 7 months ago

Thank you for all of your work on this! We decided to add the option to do region aware as a core part of the functionality of the recom function in GerryChain in v0.3.0. The implementation was slightly different than what is outlined here, but if you would like me to add you to the release notes for credit on the work that you have done, please let me know!

gabeschoenbach commented 7 months ago

Thank you for all of your work on this! We decided to add the option to do region aware as a core part of the functionality of the recom function in GerryChain in v0.3.0. The implementation was slightly different than what is outlined here, but if you would like me to add you to the release notes for credit on the work that you have done, please let me know!

Great to hear, glad the functionality has been incorporated! Yes, I'd love a shoutout in the release notes :)

peterrrock2 commented 7 months ago

Thank you for all of your work on this! We decided to add the option to do region aware as a core part of the functionality of the recom function in GerryChain in v0.3.0. The implementation was slightly different than what is outlined here, but if you would like me to add you to the release notes for credit on the work that you have done, please let me know!

Great to hear, glad the functionality has been incorporated! Yes, I'd love a shoutout in the release notes :)

Done! You are now in the release notes for version 0.3.0!