Open proebsting opened 1 week ago
Hey @proebsting, thanks for the replication code: it was very helpful in figuring out what is going on here. The good news (for both of us, I believe) is that the district "reincarnation" that you are seeing does not appear to be a bug in the code: it's actually a subtle result of the structure of the partitions. I think that this is best illustrated by example.
Let's handle the point (1) first. When I ran the code on a 20x30 grid with a population tolerance of 0, at some point, I get the following partition
And then, in the very next step, I get the exact same partition. But this is weird since ReCom should be merging the districts and drawing a random spanning tree, right? Well, the issue here is that the districts that where selected for recombination were districts 1 and 2, which are only adjacent by two cut edges. In the event that we are trying to draw a random spanning tree, we have to pick one of the two edges in the bottleneck connecting the districts. We then try to find a place to cut the spanning tree so that the populations on both sides of the cut are equal, and since the space of spanning trees which admit such a cut is very small given the structure of the graph, we end up picking one of the two edges that produces the same partition.
Now on to point (2) the "reincarnation" phenomenon. This is a natural extension of the structural issue that appeared in part (1). However, for the "reincarnation" to appear, we need to pick bottlenecked districts twice and then either have one of the districts change mildly as the result of some other ReCom or have both districts change as part of a ReCombination between the two before being switched back at a later step. As an example, please refer to districts 4 and 9 in the following gif showing the phenomenon on a 50x20 grid:
And here are some surrounding steps so you can see the chain in action:
We can see here that district 9 in step 1365 is the same as district 4 in step 1348 and that district 4 in step 1365 is the same as district 9 in step 1360. However, the reason why we see the repetition of these district structures comes from the fact that we are trying to call ReCom on the pair of bottlenecked districts 4 and 9. In the transition between step 1364 and step 1365, we do succeed in making a random spanning tree, but the edge that we use to cut the tree actually produces the same partition and we happen to have flipped the labels of the districts in the interim.
Hopefully this answers your question! I'll leave this issue open for a little bit so that you have a chance to respond but will close it next week.
We are seeing two unexpected behaviors from ReCom/GerryChain:
Put another way, we have evidence that some districts are being created by multiple independent steps of the chain. Even more concerning, we are seeing some districts being created many independent times over the course of a single chain. (We had one district appear from 42 different recombination/split steps in a 10,000-long chain.)
I have attached a bundle that can reproduce both phenomena. The quickest is with the 20x30 uniform grid. bundle.zip
The reproduction with a 40x40 uniform grid with 4 districts is particularly noteworthy given that each step is splitting 800 nodes in half, which ought to have an astronomical number of unique possibilities.
(Note, this happens with both the Kruskal and Wilson spanning tree algorithms.)