Closed gabeschoenbach closed 7 months ago
@gabeschoenbach @phorva01 I just fixed the merge conflict. @pjrule @apizzimenti do you think your concerns have been addressed? What's the status on this PR?
I updated the region_weights
to a dictionary now, which I agree is a much friendlier input type! The last thing I'm struggling with his how to correctly prioritize balance edges in the cutting step. I feel good about our power-of-two idea whenever there are no tied penalty weights, but as noted above it doesn't work in cases where (say) we have two regions each with penalties of 1. What our code does is first look for balance edges that cut across both region types, but then it prioritizes balance edges that only cut across the first listed region type. Ideally we'd want it to look for balance edges that cut across either region type and choose uniformly from that set. But I'm not sure the nicest way to get that sort of functionality! Any thoughts?
Merging #362 (5b8c9b8) into main (18cde80) will decrease coverage by
5.23%
. The diff coverage is24.36%
.
@@ Coverage Diff @@
## main #362 +/- ##
==========================================
- Coverage 87.83% 82.60% -5.24%
==========================================
Files 37 37
Lines 1735 1851 +116
==========================================
+ Hits 1524 1529 +5
- Misses 211 322 +111
Impacted Files | Coverage Δ | |
---|---|---|
gerrychain/tree.py | 54.22% <19.09%> (-21.66%) |
:arrow_down: |
gerrychain/proposals/tree_proposals.py | 26.22% <85.71%> (-1.05%) |
:arrow_down: |
gerrychain/proposals/__init__.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 18cde80...5b8c9b8. Read the comment docs.
@jenni-niels I just added a commit that keeps track of the edge penalties and only adds it to the cuts
list (from which one balance edge will be selected at random) if it has the highest penalty of all the balance edges in the tree. I think this should work the same way as you were describing your .NET implementation works, but please take a look and let me know!
Thank you for all of your work on this! We decided to add the option to do region aware as a core part of the functionality of the recom
function in GerryChain in v0.3.0. The implementation was slightly different than what is outlined here, but if you would like me to add you to the release notes for credit on the work that you have done, please let me know!
Thank you for all of your work on this! We decided to add the option to do region aware as a core part of the functionality of the
recom
function in GerryChain in v0.3.0. The implementation was slightly different than what is outlined here, but if you would like me to add you to the release notes for credit on the work that you have done, please let me know!
Great to hear, glad the functionality has been incorporated! Yes, I'd love a shoutout in the release notes :)
Thank you for all of your work on this! We decided to add the option to do region aware as a core part of the functionality of the
recom
function in GerryChain in v0.3.0. The implementation was slightly different than what is outlined here, but if you would like me to add you to the release notes for credit on the work that you have done, please let me know!Great to hear, glad the functionality has been incorporated! Yes, I'd love a shoutout in the release notes :)
Done! You are now in the release notes for version 0.3.0!
This PR was largely based on the ideas and work of @amybecker and @drdeford, on how to update the ReCom proposal function to be aware of county boundaries. @phorva01 and I extended these ideas to allow ReCom to be aware of multiple regions at once — say, counties and municipalities.
At the highest level, we add an optional parameter called
region_weights
to therecom()
function intree.py
. If included, it should be a list of tuples, like[("COUNTYFP20", 1), ("TOWN", 1)]
. This would tell ReCom to try to create districts whose boundaries fall along county and town boundary lines, to the extent possible while constrained by population balance concerns.This works in two ways:
tree.py
'srandom_spanning_tree()
, we randomly assign edge weights to our graph in order to draw a minimum spanning tree and look for an edge to cut the graph into two population-balanced parts. Ifregion_weights
is notNone
, we will add extra weight to edges that cross our regions, therefore encouraging our spanning tree to be drawn mostly inside our regions, with as few edges crossing region boundaries as possible. Regardless of where the balance edge is, this will help keep regions whole. See below for an example of (a) a minimum spanning tree drawn randomly, and (b) a minimum spanning tree drawn with counties in mindtree.py
'sfind_balanced_edge_cuts_memoization()
, we search for an edge in the minimum spanning tree to cut the graph into two equal-sized parts. Ideally, this edge is one of the few in the tree that does cross from one region to another, since this would neatly split the districts along a region boundary. We first look to see if there exists such an edge, but if not we simply look for any balance edge. If we care about multiple regions, we check to see if there is an edge that crosses as many of our regions as possible.To sum up, the optional
region_weights
parameter in therecom()
proposal function allows the user to specify an arbitrary number of regions for GerryChain to attempt to keep intact. The weights corresponding to each region will be used in the minimum spanning tree step. If one is more interested in keeping counties intact than towns, they could make the weight for counties larger than that for towns. This would mean ReCom would first look for balanced edges that span different counties in step 2. However, our testing has shown that setting weights of 1 for every region does quite well in optimizing intactness for all region-types simultaneously.To test out this code, try working through the Jupyter Notebook here. For slides on this development, click here.