hsinnan75 / MapCaller

MapCaller – An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes
MIT License
29 stars 5 forks source link

What are the magic numbers 0.25 and 0.35 in the source code? #58

Open tseemann opened 4 years ago

tseemann commented 4 years ago
if ((ins_thr = (int)(cov_thr*0.25)) < MinAlleleDepth) ins_thr = MinAlleleDepth;
if ((del_thr = (int)(cov_thr*0.35)) < MinAlleleDepth) del_thr = MinAlleleDepth;

What are these constants? They seem aribtrary? (And should be const double not hard-coded)

hsinnan75 commented 4 years ago

ins_thr and del_thr are the allele threshold for the detection of insertion and deletion, respectively. cov_thr is the threshold for the positional candidate being a variant. I observed that the column depth is normally lower than the neighboring depths when there is a deletion event, we use 0.35 to determine the threshold for deletion detection.

I use the literal instead of constant because they just appear once. But I'll change them to constant. Thank you for the suggestion.

tseemann commented 4 years ago

I am worried there is no theoretical justification for those values. Have you optimized them from synthetic data with known variants?

hsinnan75 commented 4 years ago

To be honest, those values were set empirically based on the synthetic data. A better way is to use a machine learning model to find the optimal values.