Closed iqbal-lab closed 9 years ago
cd ls /data0/users/zam/projects/piddock/cortex/results/binaries/cleaned/k31/F145.kmer31.q10cleaned_29.ctx > blist ls blist > clist ls ~/myKrobe-predictor/data/staph/antibiotics/*.fa > panel ~/tools/CORTEX/bin/cortex_var_31_c1 --colour_list clist --kmer_size 31 --mem_height 20 --mem_width 250 --align panel,no --align_input_format LIST_OF_FASTA --max_read_len 10000
ref_P404Q_fusA_1120520.ref_sub-2-fusA AATGACATTATCTTGGAATCAATGGAATTCCCAGAGCCAGTTATTCACTTATCAGTAGAGCCA ref_P404Q_fusA_1120520.ref_sub-2-fusA_colour_0_kmer_coverages 347 345 347 346 342 344 342 344 341 338 333 338 341 339 341 341 337 339 337 339 340 339 334 332 334 334 335 333 325 325 321 320 319
No sign of coverage on an alternate. Will step through in gdb.
If we looked at the best sus or the best res on it's own we would call S and R respectively...
Why are the median coverages so low even with 100 percent_nonzero?
(gdb) p abi->vars[i]->vob_best_sus->susceptible_allele {median_covg = 1, median_covg_on_nonzero_nodes = 0, min_covg = 1, percent_nonzero = 100} (gdb) p abi->vars[i]->vob_best_sus->resistant_alleles {{median_covg = 1, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 56}, {median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 25}, {median_covg = 0, \ median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 0} <repeats 58 times>}
(gdb) p abi->vars[i]->vob_best_res->susceptible_allele {median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 25} (gdb) p abi->vars[i]->vob_best_res->resistant_alleles {{median_covg = 2, median_covg_on_nonzero_nodes = 0, min_covg = 2, percent_nonzero = 100}, {median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 37}, {median_covg = 0,\ median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 0} <repeats 58 times>}
IN genotyping_known.c:505 :
B Var* var_to_update=array_vars[km];
vob->var_id=km;
if (vob->var_id!= prev_mut)
{
//this is a new enum/mutation
reset_var_on_background(vob);
prev_mut=km; //for use in the next call to this function
}
vob->gene=g;
=>vob->num_resistant_alleles=r;
If you reset the vob you also reset the var_id so you end up with something like this:
(gdb) p *vob $48 = {susceptible_allele = {median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 0}, resistant_alleles = {{median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, \ percent_nonzero = 0} <repeats 60 times>}, num_resistant_alleles = 0, some_resistant_allele_present = 0 '\000', working_current_max_res_allele_present = 0, var_id = dfrB_L21V, gene = fusA}
typedef enum { dfrB_L21V = 0, ...
Going to make two changes.
Turning back on error cleaning gives us a minor resistant call which we didn't make with error cleaning off...
** Called Variants
var R_cov S_cov
dfrB_H31N 100 100
(gdb) p *abi.vars[dfrB_H31N]->vob_best_sus $22 = {susceptible_allele = {median_covg = 580, median_covg_on_nonzero_nodes = 0, min_covg = 564, percent_nonzero = 100}, resistant_alleles = {{median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg\ = 0, percent_nonzero = 25}, {median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 0} <repeats 59 times>}, num_resistant_alleles = 2, some_resistant_allele_present = 0 '\000\ ', working_current_max_res_allele_present = 25, var_id = dfrB_H31N, gene = dfrB}
(gdb) p *abi.vars[dfrB_H31N]->vob_best_res $23 = {susceptible_allele = {median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 18}, resistant_alleles = {{median_covg = 26, median_covg_on_nonzero_nodes = 0, min_covg = 4\ , percent_nonzero = 100}, {median_covg = 5, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 68}, {median_covg = 0, median_covg_on_nonzero_nodes = 0, min_covg = 0, percent_nonzero = 0} <\ repeats 58 times>}, num_resistant_alleles = 2, some_resistant_allele_present = 1 '\001', working_current_max_res_allele_present = 100, var_id = dfrB_H31N, gene = dfrB}
A 100 percent coverage allele with median covg 23 min 4 looks real to me but I don't understand why we didn't call with error cleaning off. It's also not present in the --align output.
The above also doesn't occur with WGAssemblyThenGenotyping:
https://www.dropbox.com/s/tjye5cu5ty1hlab/Screenshot%202014-09-10%2018.19.47.png?dl=0
I don't think we should be turning error-cleaning back on though
Looks like you have fixed the issue, right? Have you checked it does not regress the results?
I haven't fixed this issue.
It seems like the problem is that we don't use median coverage when deciding on which VOB to keep in Var. Using only percent_non_zero seems like it allows these extremely low coverage alleles to be stored (so long as they have high % coverage) and then when deciding on resistotype we can end up comparing two very low (median) coverage alleles.
This also explains why turning back on error cleaning 'fixes' the problem.
Possible solution could be to just replace the :
if new_vob.per_cov > current_var.vob.per_cov
update_var()
with if new_vob.per_cov >= current_var.vob.per_cov:
if new_vob.median_cov > current_var.vob.median_cov:
updatevar()
?
Agreed. make it so!
Running Mykrobe on this sample, sequenced to >400x, gives precisely 100 on both R and S alleles. Suspicious?
pwd /Net/banyan/data0/users/zam/projects/piddock/data/Piddock_F145$
command: ~/dev/git/myKrobe-predictor/bin/Mykrobe.predictor.staph --file WTCHG_141363_25.bam --install_dir ~/dev/git/myKrobe-predictor --method InSilicoOligos myKrobe.predictor for Staphylococcus, version 0.0.0.1 Command: /home/zam/dev/git/myKrobe-predictor/bin/Mykrobe.predictor.staph --file WTCHG_141363_25.bam --install_dir /home/zam/dev/git/myKrobe-predictor --method InSilicoOligos Wed Sep 10 01:15:49 2014 Wed Sep 10 01:15:49 2014 * Start time Wed Sep 10 01:15:49 2014 * Sample: UnknownSample * Species S.aureus Wed Sep 10 01:21:36 2014 * Antimicrobial susceptibility predictions Gentamicin S Penicillin S Methicillin S Trimethoprim S Erythromycin S FusidicAcid r Ciprofloxacin S Rifampicin N Tetracycline S Vancomycin S Mupirocin S Clindamycin S * Called Variants var R_cov S_cov fusA_P404Q 100 100 <<<<<<<<<<< * Called Genes var cov \ Virulence markers PVL negative Wed Sep 10 01:21:36 2014