Closed bhaller closed 3 years ago
When building a pedigree-based relatedness matrix, outwith tskit, we want parents before progeny so that we can process pedigree records forward in time and accumulate relatedness coefficients appropriately. But one can do topological sort of the pedigree graph (DAG) to achieve this if needed.
When building a pedigree-based relatedness matrix, outwith tskit, we want parents before progeny so that we can process pedigree records forward in time and accumulate relatedness coefficients appropriately. But one can do topological sort of the pedigree graph (DAG) to achieve this if needed.
Well, it'd be a shame to put @benjeffery's nice code that does the sorting to waste, so we could still provide the functionality (e.g. tables.sort(individual_parent_sort=False)
).
But I think if it's getting in the way we should definitely remove it.
Hey campers. FYI, it would be good to know the decision on this one way or the other; some work on the SLiM side is waiting to know which way this will go. Implementation is no rush, but a thumbs-up/thumbs-down on the proposal would be helpful. :-> Thanks!
On balance I'm in favour of removing it. Here's my proposal:
TableCollection.sort()
, and create a new method IndividualTable.topological_sort()
. My experience with the topological sort was that it's too weak in practise for some algorithms and what you really want is a sort by time. This requires a link to the node table and some policy about what the time of an individual is. So, perhaps one day we'll implement TableCollection.sort_individuals()
, which does this time-sort but for now, we don't need it. I think we can accept that most tree sequences won't need any sortedness requirements on the individuals, and we'll just live with the consequences later if it turns out there are operations that do need it.IndividualTable.is_topologically_sorted()
(or a better name?)So we basically let go of the idea that individual sortedness of any kind is part of the tree sequence requirements, and pull the current functionality out into methods of the individual table.
Any objections @benjeffery @molpopgen?
Note that @molpopgen wrote on Slack: "We don't use this ordering requirement. When parent tracking is happening, it is buffered internally and the export to tskit is dealt with at the very end." So he seems fine with it.
Thanks for the discussion all, seems that removing the requirement is the right thing to do. Putting the sorting in a separate function as @jeromekelleher and @petrelharp suggests would be good, there's no point throwing away that code.
Hey campers. FYI, it would be good to know the decision on this one way or the other
This was posted at 8pm Friday my time! Give us a chance! 😂
Hey campers. FYI, it would be good to know the decision on this one way or the other
This was posted at 8pm Friday my time! Give us a chance! 😂
Ah, yes, this "weekend" thing. I've heard of that. :->
No worries, just trying to keep the ball rolling. :-> Thanks!
The sorting requirement should be removed.
I don't check github notifications on weekends...
Sounds like we have a decision!
Cool, I can make these changes seeing it was my code in the first place.
I'm assuming we still want to use this ordering for tsk_table_collection_canonicalise
?
Yes, I think so. Although it's probably too weak an ordering to actually be canonical, it's a good start and there's no good motivation to change the definition of tsk_table_collection_canonicalise
here.
Can we close this now @benjeffery?
Sorry, so used to the github auto-close!
In #1138 it was decided to require an ordering for the individuals table: children must come after their individual parents. This has turned out to be a problem for SLiM, because it has its own ordering requirements for the individuals table which conflict with this policy. This was not noticed until now because SLiM wasn't using the parents column; but now that it is putting pedigree-based info in the parents column, now we have a conflict between the two ordering policies. Further discussion of that can be found at https://github.com/MesserLab/SLiM/pull/231.
Discussion on Slack indicates that nobody is actually using this ordering requirement for anything, and that the best solution might be to simply remove tskit's ordering requirement. Quoting @jeromekelleher throughout:
So. Any objections to simply removing this sorting requirement? @petrelharp @benjeffery not sure who else to tag...