phac-nml / biohansel

Rapidly subtype microbial genomes using single-nucleotide variant (SNV) subtyping schemes
Apache License 2.0
25 stars 7 forks source link

Raise MAX_KMER_FREQUENCY Value to 10,000 #101

Closed glabbe closed 4 years ago

glabbe commented 5 years ago

We found out that about 8 samples from the original Coll et al MTB lineage 1.2.1 (which is lineage 1.2 in our MTB scheme adaptation for biohansel) have frequency (genome coverage) above 1000X for the positive kmer for that lineage (target 3479545-1.2), even if the average kmer coverage is around ~100X for these datasets. Examples include the following MTB datasets: SRR6152952, SRR6153132, SRR6153184, SRR6153187, SRR6152643, SRR6152695, and SRR6152818.

That was also causing QC module errors, therefore I am also proposing to change the default QC module parameter in biohansel for max_kmer_frequency from a value of 1000 to a value of 10,000.

glabbe commented 4 years ago

Issue fixed in PR #121