ryanlayer / samplot

Plot structural variant signals from many BAMs and CRAMs
MIT License
529 stars 67 forks source link

Question about plot interpretation; large putative inversion #198

Open jphruska opened 5 months ago

jphruska commented 5 months ago

Hello --

Thank you for developing and maintaining a powerful resource for the genomics community.

I've genotyped SVs for a species of bird, and noticed there was an aberrantly large inversion that I suspected was a false positive. To check, I produced a samplot image for individuals of the three genotypes (0/0,0/1,1/1). The genotypes of each individual are indicated.

I have a few questions regarding interpretation. First, for inversions, is there a diagnostic for discriminating between the three genotypes (such as is done with differences in read coverage for deletions)? For example, would the number of discordant paired-end reads spanning the inversion be suggestive of different genotypic states?

Secondly, it appears the main signal of an inversion here is a single pair of discordant reads? Am I interpreting this correctly? If so, there appears to be a single pair of reads that is consistently mapping to the same locations on the reference genome, regardless of the called genotype? There also appears to be a second pair of discordant reads for COL_52524, but that doesn't seem to be in support of an inversion?

Any suggestions on how to best interpret these results would be greatly appreciated.

Thanks Jack 4_14676614_57494906

jbelyeu commented 4 months ago

The signal you're seeing here of blue discordant pairs indicating an inversion is a lot of pairs with about the same placement. In samplot there's no great way to differentiate these, but you can get an idea that there are several just because the blue is pretty dark. There are also faint dotted lines indicating chimeric alignments (in addition to the discordant pairs. So, not a single discordant pair, but it's not super easy to tell how many there are aside from "several". This is related to the question of genotype - genotyping inversions isn't super easy and samplot doesn't really try to do it. If you extract the split alignments and pairs that span this breakpoint you could come up with a count that might be useful for estimating genotype, but it's not as simple as the rules of thumb that work for copy number variation.

jphruska commented 3 months ago

That makes sense, thanks. Good to know the signal appears to be strong, and perhaps indicative of a real inversion. Curiously, I ran a PCA of the SNPs located within it and didn't recover the expected signal -- individuals segregated by geography, not by zygosity. Will be interesting to dig into this further. Thanks again for your help.

warthmann commented 2 months ago

Hello, yes, I am having the same challenge that multiple reads supporting an event aren't easily distinguished. I usually then inspect and count them in IGV, but I was wondering whether you have considered stacking them in some way so that they are not displayed on top of each other. Plotting them on top of each other not only their number is impossible to tell, but split reads can also be hidden.