Closed robinycfang closed 3 years ago
Hi @robinycfang ,
Thank you for using SVJedi ! Indeed, your application is a typical use case of SVJedi.
You can safely use the numbers at DP:AD to compute your allele frequencies. The AD field gives the number of reads supporting each alleles. Note that in the case of unbalanced variants (insertions or deletions), the value for the largest allele is normalized to take into account the differences in sequence size and breakpoint number between the alleles. This explains why the second value is not an integer. The DP field is the sum of AD1 and AD2.
Therefore, in your example : allele frequency of allele 0 is 22/27.948 = 0.787 allele frequency of allele 1 is 5.948/27.948 = 0.213
Finally, we do not use absolute cutoffs to assign the genotypes, but rather choose the genotype that obtains the maximum likelihood according to a simple binomial model (assuming a diploid organism).
I hope you will get useful and interesting results with SVJedi.
Regards, Claire
Hi,
I have some Nanopore DNAseq sequenced on tumor samples and used Sniffles to call SVs. I realize the results of mutational allele frequency of sniffles and other SV callers are probably not that accurate, so I am hoping to use SVJedi to get a precise quantification of SVs. After I ran SVJedi, I got the following lines:
chr1 136986 3_1 N GCTGAGGTGGCAGGCAAGGAAGTAGGCTGGCCTCTCTCCAGCGTGGGGAGGGCCAGTGTGAGGCAGAGGCTCACACTGACCTCTCTCAGCATGGGAGGGCCGGTGTGA GACAAGGGCTCGGGCTGACCTCTCAGCGTAGGA . PASS IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr1;END=137140;STD_quant_start=23.756339;ST D_quant_stop=70.609039;Kurtosis_quant_start=-1.002582;Kurtosis_quant_stop=-2.311848;SVTYPE=INS;SUPTYPE=AL;SVLEN=141;STRANDS=+-;RE=22;REF_str and=0,0;AF=1 GT:DP:AD:PL 1/1:6.903:0,6.903:286,10,-9
chr1 662681 5 N GGCCTCCTTCACGTGGGAGGAGCAGGAGTGAGCAGGCTCCACTGGCCTCTCTCAGCGTGCGGGAGGGCAGTCGCGAGGCAAGAGCTCA . PASS IMPRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=chr1;END=662767;STD_quant_start=24.237462;STD_quant_stop=23.130067;Kurtosis_quant_start=-1.9 66639;Kurtosis_quant_stop=-1.861674;SVTYPE=INS;SUPTYPE=AL;SVLEN=90;STRANDS=+-;RE=22;REF_strand=26,24;AF=0.305556 GT:DP:AD:PL 0/1: 27.948:22,5.948:-3767144,-3767315,-3766453
VAF1 = 1 VAF2 = 5.948/27.948 = 0.213
I am focusing on somatic SVs, so looking at SVs with frequency < 0.4. Can I safely use the numbers at DP:AD for my downstream analysis? What's your cutoff to define heterozygous?