iqbal-lab / Mykrobe-predictor

Antibiotic resistance predictions in minutes on a laptop
Other
50 stars 19 forks source link

Failing to call high freq R allele, saying log lik R== log lik r #118

Closed iqbal-lab closed 5 years ago

iqbal-lab commented 7 years ago

Claudio Koser pointed us to this

"10517-03 (ERS458617 - NB: this was sequenced in two runs. You have to pool the data from both to get sufficient coverage) harbours three populations at rpoB: S450L (linked to T400A), H445X und S441X. Mykrobe only reports the last two mutations. Is S450L not reported because it is so close to T400A?."

Using commit 69eb272e0ff840af8bf71e75f695bc68f86c206a - It does spot the S450L allele:

       "rpoB_S450L-TCG761154TTG": {
            "info": {
                "filter": "LOW_GT_CONF", 
                "contamination_depths": [], 
                "coverage": {
                    "alternate": {
                        "percent_coverage": 100.0, 
                        "median_depth": 102.0, 
                        "min_non_zero_depth": 94.0
                    }, 
                    "reference": {
                        "percent_coverage": 100.0, 
                        "median_depth": 42.0, 
                        "min_non_zero_depth": 11.0
                    }
                }, 
                "expected_depths": [
                    124
                ], 
                "conf": 0
            }, 
            "_cls": "Call.VariantCall", 
            "genotype": [
                0, 
                1
            ], 
            "genotype_likelihoods": [
                -340.3206168516298, 
                -94.1655010617839, 
                -94.65994311830377
            ]
        }, 

It sees 100x on the alt allele, and 42x on the ref allele. So this should be a clear R call. However, it basically says the R and r models are indistinguishable. Bug.

Phelimb commented 7 years ago

I would argue that this is a clear r call. 30% frequency is ~ where we have the expected minor frequency by default (20%).

iqbal-lab commented 7 years ago

The ref (susceptible) allele is at 30%, so the majority is R.