lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
481 stars 133 forks source link

[minicaller] alt alle frequency calculation #249

Open tingchenlrx opened 4 months ago

tingchenlrx commented 4 months ago

I ran an older version of minicaller (version 6d7e78c) over a bam file and it output a VCF file with a selected multi-allelic variant shown below:

cusRef 24 . G A,C,T . . AC=1,1,1;AF=0.250,0.250,0.250;AN=4;DP=164410 GT:DP:DP4:DPG 0/3/2/1:164410:161365,259,2785,1:161624,1356,977,452

From the DPG field above, I calculated the alt allele frequency (AF) this way: Alt allele T AF = 1356/(161624+1356+977+452) Alt allele C AF = 977/(161624+1356+977+452) Alt allele A AF = 452/(161624+1356+977+452)

I then ran the latest version of minicaller (version 69ca18e) over the same bam file. (I was able to resolve the "too many open files" error message by increasing the value in maxRecordsInRam. Thanks so much!) In order to obtain all the variants, I turned off two filters by setting:

I then looked into the variant in the same position, and here's the variant detected by the new version:

cusRef 24 . G T 38 . AC=1;AF=0.500;AN=2;DP=324605 GT:AD:DP:FT:GQ 0/1:161624,1356:162981:LowQual:38

My questions are:

(1) In the variant from the new version of minicaller, there is only one variant (G->T), whereas there are 3 variants (G->A,C,T) from the old version. Looks like the new version just selected the alt allele with the highest read counts. Can you please explain why?

(2) In the variant from new version, is it okay for me to calculate the alt allele frequency using the AD field this way? Alt allele T AF = 1356/(161624+1356)

Our environment

I apologize for a long post, but thank you so much for your attention!

Best, Ting