llecompte / SVJedi

SV genotyping with long reads
GNU Affero General Public License v3.0
40 stars 4 forks source link

quantification of mutational allele frequency #7

Closed robinycfang closed 3 years ago

robinycfang commented 3 years ago

Hi,

I have some Nanopore DNAseq sequenced on tumor samples and used Sniffles to call SVs. I realize the results of mutational allele frequency of sniffles and other SV callers are probably not that accurate, so I am hoping to use SVJedi to get a precise quantification of SVs. After I ran SVJedi, I got the following lines:

chr1 136986 3_1 N GCTGAGGTGGCAGGCAAGGAAGTAGGCTGGCCTCTCTCCAGCGTGGGGAGGGCCAGTGTGAGGCAGAGGCTCACACTGACCTCTCTCAGCATGGGAGGGCCGGTGTGA GACAAGGGCTCGGGCTGACCTCTCAGCGTAGGA . PASS IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr1;END=137140;STD_quant_start=23.756339;ST D_quant_stop=70.609039;Kurtosis_quant_start=-1.002582;Kurtosis_quant_stop=-2.311848;SVTYPE=INS;SUPTYPE=AL;SVLEN=141;STRANDS=+-;RE=22;REF_str and=0,0;AF=1 GT:DP:AD:PL 1/1:6.903:0,6.903:286,10,-9 chr1 662681 5 N GGCCTCCTTCACGTGGGAGGAGCAGGAGTGAGCAGGCTCCACTGGCCTCTCTCAGCGTGCGGGAGGGCAGTCGCGAGGCAAGAGCTCA . PASS IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr1;END=662767;STD_quant_start=24.237462;STD_quant_stop=23.130067;Kurtosis_quant_start=-1.9 66639;Kurtosis_quant_stop=-1.861674;SVTYPE=INS;SUPTYPE=AL;SVLEN=90;STRANDS=+-;RE=22;REF_strand=26,24;AF=0.305556 GT:DP:AD:PL 0/1: 27.948:22,5.948:-3767144,-3767315,-3766453

VAF1 = 1 VAF2 = 5.948/27.948 = 0.213

I am focusing on somatic SVs, so looking at SVs with frequency < 0.4. Can I safely use the numbers at DP:AD for my downstream analysis? What's your cutoff to define heterozygous?

clemaitre commented 3 years ago

Hi @robinycfang ,

Thank you for using SVJedi ! Indeed, your application is a typical use case of SVJedi.

You can safely use the numbers at DP:AD to compute your allele frequencies. The AD field gives the number of reads supporting each alleles. Note that in the case of unbalanced variants (insertions or deletions), the value for the largest allele is normalized to take into account the differences in sequence size and breakpoint number between the alleles. This explains why the second value is not an integer. The DP field is the sum of AD1 and AD2.

Therefore, in your example : allele frequency of allele 0 is 22/27.948 = 0.787 allele frequency of allele 1 is 5.948/27.948 = 0.213

Finally, we do not use absolute cutoffs to assign the genotypes, but rather choose the genotype that obtains the maximum likelihood according to a simple binomial model (assuming a diploid organism).

I hope you will get useful and interesting results with SVJedi.

Regards, Claire