tskit-dev / tsbrowse

Utilities for evaluating inferred tree sequences
MIT License
3 stars 8 forks source link

Examples of "bad" plots #98

Open hyanwong opened 11 months ago

hyanwong commented 11 months ago

There are some cases where we think that tsqc indicates "bad stuff" (e.g. where there are long edges with no sites, etc). But a new user might not know what to look for. We should collect a gallery of these examples, with a bit of a discussion on each as to why we think this is a patter to be wary about. This could, I suppose, be in some sort of tutorial.

barneyhill commented 11 months ago

UKB 450K Exomes + Array (MAF>0.001). TTN (chr2) gene (tsinfer+tdate):

Overview image Mutations image Edges image Trees image Nodes image

barneyhill commented 11 months ago

UKB 450K Exomes + Array (MAF>0.001). Entire chr21 (tsinfer+tdate):

Overview image Mutations image image Edges image Trees image Nodes image

Notes

Jerome recommended not to continue with this tree sequence due to these QC plots. Loading times for these plots are substantial (~mins per plot) but I think that's expected given the size of the ts.

jeromekelleher commented 11 months ago

Very interesting, thanks @barneyhill.

The TTN plots don't look terrible to me. I think it would be worth looking at the actual tree for a few random subsets (of say 10 samples) at a particular location, to get a feel for what the deep structure looks like.