Open nikbaya opened 3 years ago
@nikbaya and I spent a good bit scratching our heads over this one today - it's a bit surprising that we reverse the order of individuals in a "time slice", isn't it?
@nikbaya - I don't think we need to invoke msprime here, we can simply call tables.sort()
, deleting the call to build_index and sim_ancsestry.
I wrote that sorting code back in March - from a quick look I can't see why this is happening. Will need to dig in. I'm not sure skipping sort if unneeded is desirable, it would require a table scan before sorting.
Any chance this is Linux vs macos again, where qsort is unstable on the latter?
We tried to make that sort stable...
If you do the example above and then sort again it doesn't swap them back, so i don't think it is unstable.
Stable sort means that things that are already sorted don't get reordered.
Do you have a link to the discussion about where we talked about the stability @petrelharp? It's fine if this is a "wontfix", but we should probably document it.
Here it is: https://github.com/tskit-dev/tskit/pull/1199 - ignore all the stuff about canonical sorting and subset. From that thread,
the sort here is decreasing by "minimum number of hops to a childless descendant, and then smallest position in the table among the closest childless descendant".
Hm, except now I think that should be "maximum" and "furthest", not "minimum" and "closest"? I'm not sure because of the "descreasing" bit.
However, that still doesn't explain why this example gets re-ordered. It must also have position in the list of parents of the closest/furthest childless descendant with the smallest ID" as a final sort key.
So, this is stable, but it's stable with respect to the stricter ordering defined above. This is fine (IMO), and indeed required if we want to have a canonical sort. We could make the argument that reversing the order of the final key (order in the list of parents) would be more natural, as that would make this example less surprising, though?
When
tsk_individual_table_topological_sort
is called, it may rearrange an individual table that is already in a topologically sorted order. For example, if we have two parents and their child in a table, the parents may switch IDs (see below for issue replication). The order of the parents does not matter in terms of the validity of the table, but it would be desirable to not sort individuals if there is no need.Replication: