kevlar-dev / kevlar

Reference-free variant discovery in large eukaryotic genomes
https://kevlar.readthedocs.io
MIT License
40 stars 9 forks source link

Bug with log-space transformation #362

Closed standage closed 5 years ago

standage commented 5 years ago

The probability of observing a k-mer count y conditioned on a 0/0 genotype, we assume it is due to error with rate e. Thus, it's calculated as ey. We calculate this in log space, so it turns out to be e × y.

The "dynamic" error model for SNVs scales the probability based on the copy number of the corresponding reference-allele k-mer in the reference genome. This scaling is supposed to be applied before log transformation, e.g., Rey. The current code incorrectly applies it after the log-space transform, e.g., eRy or eyR in log space.

https://github.com/kevlar-dev/kevlar/blob/bb67f52dd39ea9acf2e60313c6335e2757a6cddf/kevlar/simlike.py#L114-L118

The correct log transform is ey + log(R).