tskit-dev / tutorials

A set of tutorials for msprime and tskit.
Creative Commons Attribution 4.0 International
18 stars 15 forks source link

Tutorial: simplify #52

Open jeromekelleher opened 3 years ago

jeromekelleher commented 3 years ago

Simplify is a swiss-army knife operation - we should give a tutorial that explains how it works at a high-level and goes through some of the things you can do with it.

hyanwong commented 3 years ago

There are some (very) basic hand-drawn plots in https://pyslim.readthedocs.io/en/latest/introduction.html#what-does-slim-record-in-the-tree-sequence (for forwards simulation) and https://pyslim.readthedocs.io/en/latest/tutorial.html#sec-tutorial-simplification (more generally) that might be useful as a basis to work from? They might be too forwards-sim specific, though.

hyanwong commented 3 years ago

Stub to fill out now in simplification.md

hyanwong commented 3 years ago

@gtsambos kindly volunteered to have a stab at this, when she has time.

hyanwong commented 2 years ago

Here is some Jupyterbook text that was deleted from the main docs and could serve as a basis for an introductory paragraph or two:

Simplifying a tree sequence is an operation commonly used to remove
redundant information and only retain the minimal tree sequence necessary
to describe the genealogical history of the `samples` provided. In fact all
that the {meth}`TreeSequence.simplify` method does is to call the equivalent
table transformation method, {meth}`TableCollection.simplify`, on the
underlying tables and load them in a new tree sequence.

Removing information via {meth}`TableCollection.simplify` is done by
discarding rows from the underlying tables. Nevertheless, simplification is
guaranteed to preserve relative ordering of any retained rows in the Site
and Mutation tables.

The {meth}`TableCollection.simplify` method can be applied to a collection of
tables that does not have the `mutations.parent` entries filled in, as long
as all other validity requirements are satisfied.

Are you still on for having a go at writing this tutorial @gtsambos ?

gtsambos commented 2 years ago

Hi @hyanwong, sorry for the late reply here! I'd still be keen to write this tutorial, but I'm a bit strung up with things until the end of the year -- would it be okay to leave this until January or February? The simplify tutorial will have some stuff in common with the link_ancestors and possibly also the ibd_segments tutorial, so if possible, I think it would make sense to write them together.

hyanwong commented 2 years ago

Thanks @gtsambos: great that you're still happy to have a go at this. There's no massive urgency here, but it would be very good to fill out the tutorials site a bit more, as more and more people start using tree sequences.