katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

Change the database to CARD #68

Closed KatherineJ-H closed 6 years ago

KatherineJ-H commented 7 years ago

I have been trying to run the resistance script using the CARD database, however it always results in an empty results file. I obtained the CARD.fa from https://card.mcmaster.ca/download/0/broadstreet-v1.1.0.tar.gz. Should I run this (once extracted) through the steps outlined for clustering etc. I also noticed that it was stated that the preliminary resistance data was based on the ResFinder database and CARD, so if I have run SRST2 using the resfinder.fasta, would that have included the CARD resistance data set also. Thank you for your time.

rrwick commented 7 years ago

Katherine,

The resistance database which is based on ResFinder and CARD is the one included here with SRST2: data/ARGannot.r1.fasta

That file is kept up-to-date (we add new resistance genes periodically) and it's already correctly formatted. So just using that would be the easiest option - no clustering required, just give it to SRST2 with the --gene_db option.

If you are instead specifically interested in the CARD.fa resistance genes, then yes, you'll have to cluster them first and format them for SRST2 using the instructions here.

Let me know if that sorts it out for you or if you have any other questions!

Ryan

katholt commented 7 years ago

Also, note that CARD includes many genes that are core chromosomal genes not acquired resistance genes... so you will get hits for the chromosomal genes in every isolate you type, even if the allele present is not resistance-related. This is fine as long as you understand the underlying database you are working with and interpret it accordingly. Most people using SRST2 are doing so because they want to identify acquired resistance genes/alleles, for which we recommend our pre-formatted database ARGannot.r1.fasta which is based on the ARG Annot database with some additions from ResFinder and CARD (but only the acquired genes).