iqbal-lab / Mykrobe-predictor

Antibiotic resistance predictions in minutes on a laptop
Other
50 stars 19 forks source link

Extending #89

Closed Phelimb closed 8 years ago

Phelimb commented 8 years ago

This PR pulls two things to master:

1) Code generation: All code generation code has been updated to reflect master (there were previously some minor differences)

2) Formal contamination check If we see multiple species there's now a change in the models compared: When resistotyping at genes and mutations we now consider two coverages.

a) The expected coverage on target species (as before)

and

b) The maximum coverage on all non-target species (called contamination_covg)

Then if contamination_covg > 0 we compare 2 models for S taking the ML model:

Is the resistant coverage due to contamination with S from target

double llk_S_contaim = get_log_lik_R_S_coverage(var, contamination_covg, expected_covg, kmer); 

Is the resistant coverage due to errors with S from target

double llk_S_error = get_log_lik_R_S_coverage(var, expected_covg * err_rate / 3, expected_covg, kmer);  

We then take the most likely of these for our llk_S model

And similarly for R

Is the resistant coverage due to target with S from errors

double llk_R_error = get_log_lik_R_S_coverage(var, expected_covg, expected_covg * err_rate / 3, kmer);

Is the resistant coverage due to target with S from contamination

double llk_R_contaim = get_log_lik_R_S_coverage(var, expected_covg, contamination_covg, kmer);

The case with no contamination is unchanged.


Testing on Staph shows improvement in specificity with minor decrease in sensitivity. Will update with stats.

Testing on TB shows no change in specificity or sensitivity as contamination is far rarer in these datasets.