hyanwong / giglib

MIT License
4 stars 2 forks source link

Inference from real data: useful references #119

Open hyanwong opened 5 months ago

hyanwong commented 5 months ago

Inferring a GIG from real data is probably going to be the most difficult part of the entire GIG project. This issue is to collect ideas and references.

For a start, I've just come across the paper/software below which references various approaches for constructing simple trees from k-mers. It strikes me that we might have to use a k-mer approach for GIG inference, as this is the only way we will be robust to different coordinate systems, so I wonder if there is anything we can use from these ideas. A web search for alignment-free phylogeny will probably go a long way here:

https://pubmed.ncbi.nlm.nih.gov/38547397/

Also PanMAN gives a nice example of running an algorithm to produce an ancestry with structural variation, for limited recombinant ancestries such as for SARS-CoV2

https://www.biorxiv.org/content/10.1101/2024.07.02.601807v1.full.pdf+html

hyanwong commented 4 months ago

Raw data is available for 1000G at https://www.biorxiv.org/content/10.1101/2024.04.18.590093v1

hyanwong commented 4 months ago

Richard Durbin also has a new preprint out about TE insertions in real data: https://www.biorxiv.org/content/10.1101/2024.04.05.588311v1.full