mggg / GerryChain

Use MCMC to analyze districting plans and gerrymanders
https://mggg.github.io/GerryChain/
Other
133 stars 74 forks source link

Unexpected district duplication in ReCom chains #426

Open proebsting opened 1 week ago

proebsting commented 1 week ago

We are seeing two unexpected behaviors from ReCom/GerryChain:

  1. We're getting an identical plan from a chain twice in a row with some frequency. Our mental model is that two successive plans should share N-2 identical districts along with two new districts (from the pairwise recombination/split).
  2. We're also seeing some district configurations get "reincarnated" with some frequency, by which we mean a district configuration in the current plan wasn't in the immediate predecessor plan in the chain but that district configuration was in one of the plans on the chain before the predecessor. Our mental model here is that generated districts are random and astronomical in number so the same one reappearing should be extraordinarily rare.
    Put another way, we have evidence that some districts are being created by multiple independent steps of the chain. Even more concerning, we are seeing some districts being created many independent times over the course of a single chain. (We had one district appear from 42 different recombination/split steps in a 10,000-long chain.)

I have attached a bundle that can reproduce both phenomena. The quickest is with the 20x30 uniform grid. bundle.zip

The reproduction with a 40x40 uniform grid with 4 districts is particularly noteworthy given that each step is splitting 800 nodes in half, which ought to have an astronomical number of unique possibilities.

(Note, this happens with both the Kruskal and Wilson spanning tree algorithms.)

peterrrock2 commented 23 hours ago

Hey @proebsting, thanks for the replication code: it was very helpful in figuring out what is going on here. The good news (for both of us, I believe) is that the district "reincarnation" that you are seeing does not appear to be a bug in the code: it's actually a subtle result of the structure of the partitions. I think that this is best illustrated by example.

Let's handle the point (1) first. When I ran the code on a 20x30 grid with a population tolerance of 0, at some point, I get the following partition

example_of_repetition

And then, in the very next step, I get the exact same partition. But this is weird since ReCom should be merging the districts and drawing a random spanning tree, right? Well, the issue here is that the districts that where selected for recombination were districts 1 and 2, which are only adjacent by two cut edges. In the event that we are trying to draw a random spanning tree, we have to pick one of the two edges in the bottleneck connecting the districts. We then try to find a place to cut the spanning tree so that the populations on both sides of the cut are equal, and since the space of spanning trees which admit such a cut is very small given the structure of the graph, we end up picking one of the two edges that produces the same partition.

Now on to point (2) the "reincarnation" phenomenon. This is a natural extension of the structural issue that appeared in part (1). However, for the "reincarnation" to appear, we need to pick bottlenecked districts twice and then either have one of the districts change mildly as the result of some other ReCom or have both districts change as part of a ReCombination between the two before being switched back at a later step. As an example, please refer to districts 4 and 9 in the following gif showing the phenomenon on a 50x20 grid:

reincarnation_highlight

And here are some surrounding steps so you can see the chain in action:

reincarnation

We can see here that district 9 in step 1365 is the same as district 4 in step 1348 and that district 4 in step 1365 is the same as district 9 in step 1360. However, the reason why we see the repetition of these district structures comes from the fact that we are trying to call ReCom on the pair of bottlenecked districts 4 and 9. In the transition between step 1364 and step 1365, we do succeed in making a random spanning tree, but the edge that we use to cut the tree actually produces the same partition and we happen to have flipped the labels of the districts in the interim.

Hopefully this answers your question! I'll leave this issue open for a little bit so that you have a chance to respond but will close it next week.