hyanwong / GeneticInheritanceGraphLibrary

MIT License
3 stars 1 forks source link

Backward simulations: a possible breakthrough? #93

Open hyanwong opened 4 months ago

hyanwong commented 4 months ago

I just realised something possibly profound. To a first order approximation let's assume that genomic rearrangements are genealogically "neutral", or more specifically that given two genomes in a GIG, we can always choose a recombination point that will produce a valid recombinant genome and that genome will have no selective disadvantage.

If this is the case, then the "full graph" structure (i.e. the non-sample-resolved graph) is unaffected by the positions of recombination / rearrangement. In fact I think the graph should be the same as Griffith's "big ARG".

In other words, if we can simulate one of these graphs (i.e. the big ARG) backwards in time, we can also do the same for a GIG. We can then choose the locations for recombination and rearrangement forward-in-time.

It's not a perfect solution because the big ARG is so huge and tedious to simulate. But it shows that it is technically possible.

hyanwong commented 4 months ago

The simulate-SVs-in-backward-time problem then becomes a question of which parts of the big ARG we can throw away while maintaining the ability to overlay recombinations between genomes with different coordinate systems.

Our problem is that we require the ancestral structure to calculate the MRCA regions between pairs of genomes, in order to figure out where recombination can occur. This limits the amount of pruning of non-ancestral material that we are able to do as we simulate up the tree. It might be a route to producing a stopping criterion, though.

The Hudson approach shows that it is possible to throw away most of the big ARG under simplistic assumptions about a universal coordinate system and randomly chosen breakpoint positions.