Making find_IBD more efficient by pruning ancestral segments

tskit-dev / tskit

Population-scale genomics

MIT License

153 stars 72 forks source link

find_ibd relies on creating 'master' lists of segments underneath ancestral node. However, these lists can take up a lot of memory, especially if they are high up in the tree.

However, it isn't necessary to keep all of these lists. Once we have processed all of the edges where some node u is a child, we will not need to refer to the list of segments underneath u any more, and can prune these segments from memory.

This could be done by keeping a list that counts the number of relevant edges for each ancestral node, and calls a 'prune' function when that number is decremented to 0 for any node.

tskit-dev / tskit

Making find_IBD more efficient by pruning ancestral segments #1636