Open ShyamieG opened 2 months ago
Sorry, I just realized that there are in fact two mutations that violate the rule. It seems they ended up on edges that no longer exist after I truncate my tree. I can simply remove them and it works.
Here is the function I wrote:
def truncate_tree(tree_seq, max_age):
ts_tables = tree_seq.dump_tables()
ts_tables.delete_older(max_age) # deletes mutations and edges
# delete nodes that are too old
too_old_nodes = set(np.where(tree_seq.nodes_time > max_age)[0])
ts_tables.nodes.clear()
for node in tree_seq.nodes():
if node.id not in too_old_nodes:
ts_tables.nodes.add_row(flags = node.flags, \
time = node.time, \
population = node.population, \
individual = node.individual, \
metadata = node.metadata)
trunc_ts = ts_tables.tree_sequence()
# delete mutations whose edge no longer exists
ts_tables = trunc_ts.dump_tables()
ts_tables.mutations.clear()
for mutation in trunc_ts.mutations():
if mutation.edge != -1:
ts_tables.mutations.add_row(site = mutation.site, \
node = mutation.node, \
derived_state = mutation.derived_state, \
parent = mutation.parent, \
metadata = mutation.metadata)
new_tree_seq = ts_tables.tree_sequence()
return new_tree_seq
Hmm, should this be closed? Having a mutation positioned on an edge that no longer exists sounds like the truncation operation has left the user with an invalid tree sequence, no? The mutation should either be removed, or re-positioned onto an edge that does still exist, no? I'm gonna reopen this for now, just to make sure this gets seen – although perhaps I'm wrong and there is no bug here.
Gee, thanks, @ShyamieG - this is an issue. I think that either
msprime.sim_ancestry(initial_state=...)
, in that it needs to check for mutations above the roots and either deal with them properly (either put them on the right branches or error?), orMy inclination is (1), and in fact it'd be a pretty easy thing to fix up (ie to put the mutations where they should go?) but I haven't thought about it hard. I'm going to transfer this to msprime.
Tangentially, we have the code to do the needed operation over in https://github.com/tskit-dev/tskit/pull/2938:
static int
tsk_treeseq_slide_mutation_nodes_up(
const tsk_treeseq_t *self, tsk_mutation_table_t *mutations)
{
int ret = 0;
double t;
tsk_id_t c, p, next_mut;
const tsk_table_collection_t *tables = self->tables;
const tsk_size_t num_nodes = tables->nodes.num_rows;
double *sites_position = tables->sites.position;
double *nodes_time = tables->nodes.time;
tsk_tree_t tree;
ret = tsk_tree_init(&tree, self, TSK_NO_SAMPLE_COUNTS);
if (ret != 0) {
goto out;
}
next_mut = 0;
for (ret = tsk_tree_first(&tree); ret == TSK_TREE_OK; ret = tsk_tree_next(&tree)) {
while (next_mut < (tsk_id_t) mutations->num_rows
&& sites_position[mutations->site[next_mut]] < tree.interval.right) {
t = mutations->time[next_mut];
if (tsk_is_unknown_time(t)) {
ret = TSK_ERR_DISALLOWED_UNKNOWN_MUTATION_TIME;
goto out;
}
c = mutations->node[next_mut];
tsk_bug_assert(c < (tsk_id_t) num_nodes);
p = tree.parent[c];
while (p != TSK_NULL && nodes_time[p] <= t) {
c = p;
p = tree.parent[c];
}
tsk_bug_assert(nodes_time[c] <= t);
mutations->node[next_mut] = c;
next_mut++;
}
}
if (ret != 0) {
goto out;
}
out:
tsk_tree_free(&tree);
return ret;
}
and python code:
def _slide_mutation_nodes_up(ts, mutations):
# adjusts mutations' nodes to place each mutation on the correct edge given
# their time; requires mutation times be nonmissing and the mutation times
# be >= their nodes' times.
assert np.all(~tskit.is_unknown_time(mutations.time)), "times must be known"
new_nodes = mutations.node.copy()
mut = 0
for tree in ts.trees():
_, right = tree.interval
while (
mut < mutations.num_rows and ts.sites_position[mutations.site[mut]] < right
):
t = mutations.time[mut]
c = mutations.node[mut]
p = tree.parent(c)
assert ts.nodes_time[c] <= t
while p != -1 and ts.nodes_time[p] <= t:
c = p
p = tree.parent(c)
assert ts.nodes_time[c] <= t
if p != -1:
assert t < ts.nodes_time[p]
new_nodes[mut] = c
mut += 1
# in C the node column can be edited in place
new_mutations = mutations.copy()
new_mutations.clear()
for mut, n in zip(mutations, new_nodes):
new_mutations.append(mut.replace(node=n))
return new_mutations
Someone else has hit this, FWIW: https://github.com/tskit-dev/pyslim/issues/352 - tldr; the input trees were simplified (they shouldn't have been, or should have had keep_input_roots=True
).
Hi all,
I'm running into a "TSK_ERR_MUTATION_TIME_OLDER_THAN_PARENT_NODE" error when trying to recapitate a truncated tree sequence. I've attached some code and a tree sequence to walk through the issue.
The original tree sequence can be recapitated without a problem, but I run into the error when I try to truncate my tree sequence first using the 'delete_older' method:
the above code gave me the following error: Traceback (most recent call last): File "", line 1, in
File "/Users/shyamag/miniconda3/envs/slim-env/lib/python3.9/site-packages/msprime/ancestry.py", line 1226, in sim_ancestry
return _wrap_replicates(
File "/Users/shyamag/miniconda3/envs/slim-env/lib/python3.9/site-packages/msprime/ancestry.py", line 660, in _wrap_replicates
deque = collections.deque(iterator, maxlen=1)
File "/Users/shyamag/miniconda3/envs/slim-env/lib/python3.9/site-packages/msprime/ancestry.py", line 1518, in run_replicates
ts = tables.tree_sequence()
File "/Users/shyamag/miniconda3/envs/slim-env/lib/python3.9/site-packages/tskit/tables.py", line 3341, in tree_sequence
return tskit.TreeSequence.load_tables(self)
File "/Users/shyamag/miniconda3/envs/slim-env/lib/python3.9/site-packages/tskit/trees.py", line 4110, in load_tables
ts.load_tables(tables._ll_tables, build_indexes=build_indexes)
_tskit.LibraryError: A mutation's time must be < the parent node of the edge on which it occurs, or be marked as 'unknown'. (TSK_ERR_MUTATION_TIME_OLDER_THAN_PARENT_NODE)
I found this strange because I'm not simulating any new mutations during recapitation. I also checked for cases where a mutation has an older time than the parent node of its associated edge and found none:
I went ahead and marked all my mutation times as unknown, as suggested by the error message. I don't really need mutation times for my application, so that is a viable workaround for me. However, I feel like this should not be happening and might indicate a bug. Any ideas what might be going on?
Thanks!