Open jeromekelleher opened 1 year ago
What should we call it? stop_on_coalescence
or something?
stop_at_MRCA ?
Perhaps simulate_above_MRCA would be less ambiguous.
Would need to make sure to catch people giving no termination criterion, though.
How about stop_at_local_mrca=True
as the default? Keeping the overall stopping condition as all intervals have reached a local MRCA seems fine. The only case when we wouldn't want to do this is when we have a fixed pedigree, and I think it would be unusual for us to have full coalescence before leaving the pedigree in most situations?
But, perhaps we need to think about local and global stopping conditions explicitly?
stop_at_local_mrca is quite clear. I'd rather be explicit about global stopping conditions. Say if someone runs a test simulation on a short span of the genome, the global stopping may be unexpected. We've had cases where we want to simulate ancestry to a specific time in the past in a DTWF model (for example to look at contributions from a given ancestor).
OK, sounds good to me. Let's call it stop_at_local_mrca
so, and think about the global stopping condition separately (I'll open an issue)
min_overlap = 2
if not self.stop_at_local_mrca:
min_overlap = 0
# Update the number of extant segments.
if self.S[left] == min_overlap:
self.S[left] = 0
right = self.S.succ_key(left)
else:
right = left
while right < r_max and self.S[right] != min_overlap:
self.S[right] -= 1
right = self.S.succ_key(right)
alpha = self.alloc_segment(
left=left,
right=right,
node=u,
population=population_index,
label=label,
)
@GertjanBisschop and I had a look at this, and we think something like this snippet in the core merge_x_ancestors
functions will do the trick.
Actually ,maybe this is more generally useful. What if we define a parameter min_local_lineages
which tells us when to stop tracking the ancestry of segments along the genome? By default this would be 1 (stop when the region has coalesced). We can set it to 0 to say "never stop simulating local regions" (you can't hit zero), or if we set it to 2 (say) this would stop simulating each region when there are only two lineages left.
I'm not sure how useful the last idea is, but worth thinking about?
For reference, the current code looks like this:
if self.S[left] == 2:
self.S[left] = 0
right = self.S.succ_key(left)
else:
right = left
while right < r_max and self.S[right] != 2:
self.S[right] -= 1
right = self.S.succ_key(right)
alpha = self.alloc_segment(
left=left,
right=right,
node=u,
population=population_index,
label=label,
)
A natural addition to being able to keep a record of all genomes that we simulate through (i.e. keeping unary #2128 ) is to keep a record of the nodes that we go through after a local root has been reached.
This is simple enough to do, we just stop snipping out sections of the ancestry that have fully coalesced and change the stopping condition of the simulation.
cc @sgravel