katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
125 stars 65 forks source link

Need a maximum divergence option to simplify gene reporting. #8

Closed katholt closed 10 years ago

katholt commented 10 years ago

I noticed that we were sometimes picking up quite distant homologs of genes that are in the database, e.g. calls with 250 SNPs against a gene of total length 1000bp. So 25% divergent. Unless you are actually looking for NOVEL genes rather than known ones, we don’t really want this! We JUST want the confident hits to known resistance genes, not hits to distant homologs. Note this might be useful for identifying new virulence genes, or generating hypotheses about unexplained resistance phenotypes, but just adds confusion to resistance gene typing.

katholt commented 10 years ago

I have added a flag --max_divergence to specify the maximum level of divergence you want reported. The default is 10 (i.e. report only hits with <10% divergence). You can increase this if you are interested in looking for e.g. novel virulence gene homologs.