mggg / GerryChain

Use MCMC to analyze districting plans and gerrymanders
https://mggg.github.io/GerryChain/
Other
132 stars 74 forks source link

bipartition_tree and bipartition_tree_random have identical behavior #407

Closed mkarrmann closed 9 months ago

mkarrmann commented 2 years ago

Once https://github.com/mggg/GerryChain/pull/406 is merged, then the only difference between bipartition_tree and bipartition_tree_random will be that bipartition_tree_random supports a repeat_until_valid parameter. However, the documentation indicates the intended difference is the bipartition_tree_random returns a random valid cut, while bipartition_tree returns the first valid cut.

I assume the motivation for this distinction is that bipartition_tree_random can be used want a more truly random graph partition, while bipartition_tree can be used if one is willing to tradeoff some randomness for a performance improvement (assuming that the balance_edge_fn support early escape once a valid cut is found, which it seems none of the provided ones do).

I propose:

Two considerations:

  1. By changing the expected interface for balanced_edge_fns, this technically breaks backwards compatibility, e.g. if someone was running a recom chain with a partition function that used bipartition_tree but with a custom balanced_edge_fn, then their chain would break. I'm not sure if that introduces any specific concerns. Needless this to say, this is a very rare use case, so I assume isn't a major concern unless Gerrychain was strict policies around versioning semantics.
  2. Using the first valid cut as opposed to a random valid cut introduces an interesting bias, and it's not completely obvious to me that this bias isn't trivial. Additionally, the documentation doesn't make this intended behavior clear, so many may not realize they are introducing this bias, especially considering bipartition_tree is used by default by Recom. My preferred solution would be to make bipartition_tree_random the default used by Recom, and update the docstrings to make the intended distinction between bipartition_tree_random and bipartition_tree more clear. Do others agree?
peterrrock2 commented 9 months ago

Thank you for bringing this to our attention! There is now a difference between these two functions in v0.3.1 based on the new region-aware capabilities of GerryChain.

mkarrmann commented 9 months ago

@peterrrock2 Nice! I'll check that out!