ryanlayer / samplot

Plot structural variant signals from many BAMs and CRAMs
MIT License
504 stars 66 forks source link

Question about plot interpretation; large putative inversion #198

Open jphruska opened 1 month ago

jphruska commented 1 month ago

Hello --

Thank you for developing and maintaining a powerful resource for the genomics community.

I've genotyped SVs for a species of bird, and noticed there was an aberrantly large inversion that I suspected was a false positive. To check, I produced a samplot image for individuals of the three genotypes (0/0,0/1,1/1). The genotypes of each individual are indicated.

I have a few questions regarding interpretation. First, for inversions, is there a diagnostic for discriminating between the three genotypes (such as is done with differences in read coverage for deletions)? For example, would the number of discordant paired-end reads spanning the inversion be suggestive of different genotypic states?

Secondly, it appears the main signal of an inversion here is a single pair of discordant reads? Am I interpreting this correctly? If so, there appears to be a single pair of reads that is consistently mapping to the same locations on the reference genome, regardless of the called genotype? There also appears to be a second pair of discordant reads for COL_52524, but that doesn't seem to be in support of an inversion?

Any suggestions on how to best interpret these results would be greatly appreciated.

Thanks Jack 4_14676614_57494906

jbelyeu commented 2 weeks ago

The signal you're seeing here of blue discordant pairs indicating an inversion is a lot of pairs with about the same placement. In samplot there's no great way to differentiate these, but you can get an idea that there are several just because the blue is pretty dark. There are also faint dotted lines indicating chimeric alignments (in addition to the discordant pairs. So, not a single discordant pair, but it's not super easy to tell how many there are aside from "several". This is related to the question of genotype - genotyping inversions isn't super easy and samplot doesn't really try to do it. If you extract the split alignments and pairs that span this breakpoint you could come up with a count that might be useful for estimating genotype, but it's not as simple as the rules of thumb that work for copy number variation.