Open petrelharp opened 3 years ago
Would we do this using the same topological sort algorithm as individuals @petrelharp? If so, I'd vote for taking the same strategy as individuals: doing the O(n) topological sort as for TableCollection.sort() and the more expensive version for canonicalise()
. I'm sure we could abstract the topological sorting code out if we wanted to, but it may not be worth it.
The next question is, when do we need this? I think we can probably put this off until the next C release (i.e., not the impending one), since we have non-parent sorting code for mutations out in the wild already?
I'd also like to defer this until the next release as this one is already long in the tooth.
Yep, no hurry on this.
About the algorithm: if the number of mutations per site is k, the canonicalise implementation is O(n log k), so I don't think the topological sort has any practical advantages (it might even be slower in typical cases?).
About the algorithm: if the number of mutations per site is k, the canonicalise implementation is O(n log k), so I don't think the topological sort has any practical advantages (it might even be slower in typical cases?).
Yes, that's true, but consistency is nice? I find the topological sort algorithm very pretty, and I really doubt it could be slower, as calling qsort() lots of times is much more expensive than a bunch of linear passes through the arrays.
Note: The test at https://github.com/tskit-dev/tskit/blob/main/python/tests/test_tables.py#L4195 should be reinstated when this is fixed.
Adding this to the 1.0 release for now. I guess it's not a critical thing API wise, though?
@petrelharp we're triaging stuff for the C 1.0 API. Since this doesn't affect the API as such, I think we can probably drop it from this milestone (and move to C Upcoming). Unless you'd like to pick it up?
Sounds good.
Now #1199 does this for individuals (yay!) but we still don't have a method, besides
canonicalise
, that will do this. Seems like it should be a standard part ofsort
. We could:canonicalise
, orI think the main impact on performance would be memory use? And, it's probably not a big deal?
I think this is the same thing as #1227, but perhaps what's happening there is different (and thus a bug).