katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
125 stars 65 forks source link

Adaptation of SRST2: #90

Closed EisenRa closed 7 years ago

EisenRa commented 7 years ago

Dear authors,

I have shotgun metagenomic data that has been enriched for a number of genes, and would like to analyse them phylogenetically. Rather than reinventing the wheel, I was thinking that I could perhaps use SRST2 (against a custom gene DB containing my genes of interest) to do all the mapping/QC/consensus sequence building. Do you have any thoughts on this approach?

Thank you!

katholt commented 7 years ago

Apologies for the delayed reply!

SRST2 was built for the purpose of accurate allele assignment for cultured isolates... i.e. we assume the readset comes from a pure culture that contains a single allele for each gene, and we attempt to identify the one specific allele present for each query gene that we detect. For metagenomic data, it is likely that you will have multiple alleles for your gene/s of interest, and SRST2 is not designed to detect and report multiple alleles... so you could use the gene screening outputs from SRST2 to detect presence and depth of genes, but you should ignore the allele calls for those genes (unless the minor allele frequency reported is very low, indicating that only a single allele is present).

You say that you want to analyse the results 'phylogenetically'... I'm not quite sure what you mean, but I assume for your purposes you would need to differentiate SNPs in the sample... in which case other tools that are built for strain typing from metagenomes would probably be more suitable. See e.g. https://github.com/snayfach/MIDAS