Closed trevorcousins closed 5 months ago
Hi @trevorcousins,
the tree sequence format does not cache any information on the roots of the marginal trees. So one cannot simply iterate over all roots and ask for their node time (= tree height). That does mean indeed that sometimes you build the next tree while iterating just to find that the root has not changed. This should not be too big of an issue I think when the number of samples is not huge.
root_times = np.zeros(ts.num_trees)
for i, tree in enumerate(ts.trees()):
root_times[i] = tree.time(tree.root)
Thanks for the response Gertjan, though that's not really what I am asking. I'm wondering what's the quickest way to generate lots of marginal trees? Is it optimal to set e.g. L large and r low, or r large and L low? Or does it not matter?
Aha, got it, sorry completely misunderstood your question.
I think the quadratic scaling with LrNe in terms of runtime of the algorithm should not be affected by the relative contribution of L or r I think, that is when setting discrete_genome=False
to make sure that each recombination event can hit a unique site.
Couldn't you just run a whole bunch of simulations conditioning on there being at least one recombination and look at the height change going from the first tree to the next. This way you could parallelize things. You probably want to consider your choice of random seeds though when running a lot of replicates (see docs).
If I am interested in recording a single locus' coalescent time distribution, it is easy to do lots of independent tree replicates, e.g:
However, I am interested in the distribution of tree heights at locus i+1 given locus i. In other words conditional on the current tree height, what is the distribution of the next tree height? Previously I recorded this by doing something like
then iterate through
for tree in sim.trees()
and record the height for each. It feels like there should be a faster way to do this, as the information must be built into what happens to a lineage after a recombination event. Even in my current method, it's not obvious what is the optimal L/r tradeoff to get the most transitions per unit time.Any tips much appreciated!
Thanks,
Trevor