Once https://github.com/mggg/GerryChain/pull/406 is merged, then the only difference between bipartition_tree and bipartition_tree_random will be that bipartition_tree_random supports a repeat_until_valid parameter. However, the documentation indicates the intended difference is the bipartition_tree_random returns a random valid cut, while bipartition_tree returns the first valid cut.
I assume the motivation for this distinction is that bipartition_tree_random can be used want a more truly random graph partition, while bipartition_tree can be used if one is willing to tradeoff some randomness for a performance improvement (assuming that the balance_edge_fn support early escape once a valid cut is found, which it seems none of the provided ones do).
I propose:
Adding a parameter to find_balanced_edge_cuts_contraction and find_balanced_edge_cuts_memoization to optionally early escape as soon as a valid cut is found. Default value of this parameter will be False to support backwards compatibility.
Call the balanced_edge_fn with early_escape set to true in bipartition_tree and false in bipartition_tree_random.
Two considerations:
By changing the expected interface for balanced_edge_fns, this technically breaks backwards compatibility, e.g. if someone was running a recom chain with a partition function that used bipartition_tree but with a custom balanced_edge_fn, then their chain would break. I'm not sure if that introduces any specific concerns. Needless this to say, this is a very rare use case, so I assume isn't a major concern unless Gerrychain was strict policies around versioning semantics.
Using the first valid cut as opposed to a random valid cut introduces an interesting bias, and it's not completely obvious to me that this bias isn't trivial. Additionally, the documentation doesn't make this intended behavior clear, so many may not realize they are introducing this bias, especially considering bipartition_tree is used by default by Recom. My preferred solution would be to make bipartition_tree_random the default used by Recom, and update the docstrings to make the intended distinction between bipartition_tree_random and bipartition_tree more clear. Do others agree?
Thank you for bringing this to our attention! There is now a difference between these two functions in v0.3.1 based on the new region-aware capabilities of GerryChain.
Once https://github.com/mggg/GerryChain/pull/406 is merged, then the only difference between
bipartition_tree
andbipartition_tree_random
will be thatbipartition_tree_random
supports arepeat_until_valid
parameter. However, the documentation indicates the intended difference is thebipartition_tree_random
returns a random valid cut, whilebipartition_tree
returns the first valid cut.I assume the motivation for this distinction is that
bipartition_tree_random
can be used want a more truly random graph partition, whilebipartition_tree
can be used if one is willing to tradeoff some randomness for a performance improvement (assuming that thebalance_edge_fn
support early escape once a valid cut is found, which it seems none of the provided ones do).I propose:
Adding a parameter to
find_balanced_edge_cuts_contraction
andfind_balanced_edge_cuts_memoization
to optionally early escape as soon as a valid cut is found. Default value of this parameter will be False to support backwards compatibility.Call the
balanced_edge_fn
withearly_escape
set to true inbipartition_tree
and false inbipartition_tree_random
.Two considerations:
balanced_edge_fn
s, this technically breaks backwards compatibility, e.g. if someone was running a recom chain with a partition function that usedbipartition_tree
but with a custombalanced_edge_fn
, then their chain would break. I'm not sure if that introduces any specific concerns. Needless this to say, this is a very rare use case, so I assume isn't a major concern unless Gerrychain was strict policies around versioning semantics.bipartition_tree
is used by default by Recom. My preferred solution would be to makebipartition_tree_random
the default used by Recom, and update the docstrings to make the intended distinction betweenbipartition_tree_random
andbipartition_tree
more clear. Do others agree?