sanger-pathogens / ariba

Antimicrobial Resistance Identification By Assembly
http://sanger-pathogens.github.io/ariba/
Other
167 stars 52 forks source link

Ecoli no 1 profile warning #163

Closed karinlag closed 7 years ago

karinlag commented 7 years ago

Hi! I get this warning when I get the no 1 ecoli profile. Could you expand on what this really means?

(ariba)[karinlag@abel mlst]$ python3 /work/projects/nn9305k/bin/virtenv/ariba/bin/ariba pubmlstget "Escherichia coli#1" get_mlst1 WARNING: Same profile found twice in input file, but two different STs. Going to use the ST with the smaller number (7066) ... STs are 7066 7067 and alleles are adk:10, fumC:957, gyrB:4, icd:8, mdh:601, purA:8, recA:2 WARNING: Median sequence length is 469 but fumC.798 has length 382 which is too long or short. Removing

andrewjpage commented 7 years ago

Hi, Some of the MLST databases have poor quality data in them and ARIBA warns you about this. The first warning message indicates identical allele profiles have different STs (which should never happen). In the second case it is warning about a very long allele sequence compared to the rest. Normally the length of the allele sequences is very similar, and to have one thats way outside of that can indicate poor quality data (its very rare that its real). With some databases where manual curation is lax, this occurs quite a bit (e.g. sequences which dont translate to proteins, truncated). Andrew