katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

Different MLST result when re-running SRST2 after adding in new alleles #73

Open swlong opened 7 years ago

swlong commented 7 years ago

Howdy all,

So running into an odd issue. I am running a few K. pneumoniae samples against an MLST database using 0.2.0, and on first run, it generated the following best match:

image

It says the rpoB_135 allele hit has 1 SNP, and these are the coverage stats:

image

So I saved my new consensus fastas, and then added them back to my MLST database to allow for calling against these "new" alleles. When rerunning the same FASTQ file, I then got this result:

258 3 3 1 1 1 1 79

With these stats:

42.48 0.171428571429

Any ideas? The alleles for 258 should've been present in the earlier database, so not sure why I would get a 135* call on first run, with overall less depth of coverage (~22x) vs 42x on the repeat, with a clean hit against all ST258 alleles.

Best, S. Wesley Long

rrwick commented 7 years ago

Hmmm, that's a interesting one. Our first hypothesis that is your reads are a mix of different genomes, as this has been the cause of weird SRST2 results in the past. So perhaps run some QC to see if you have a mixed sample?

It would be really informative if you could run SRST2 with --save_scores for your two databases (before and after the new allele was added). Then we could take a look at the .scores files. In particular, I'm curious how allele 1 scored in your first run and how allele 135 scored in your second run.

swlong commented 7 years ago

Mixed sample is most likely. These are all samples from clinical specimens, so not unusual to have a "community" of organisms.

Fairly busy at the moment but I will try to get the scores run on this particular example and let you know what they say.

swlong commented 7 years ago

(Accidental close, apologies)