sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
469 stars 80 forks source link

How get sequences of specific bacteria from fastq metagenome using sourmash? #2566

Closed marsfro closed 11 months ago

marsfro commented 1 year ago

Hello everyone! I got taxonomy by sourmash gather, table with md5 hash numbers How I could use it to extract specific reads (not contigs) of one Species (or Strain) from metagenome fastq file by sourmash? Or might there are another ways?

Maria

ctb commented 1 year ago

hi @marsfro, please see the conversation here! https://github.com/sourmash-bio/sourmash/issues/2535

The real answer is "map the reads to the genomes identified by sourmash", which you can do yourself or you can employ genome-grist to do.

I hope that helps!

marsfro commented 1 year ago

Thank you!

ctb commented 12 months ago

Added to #2184 in e828ff99:

How do I get the sequences for a particular reference genome from a metagenome, using sourmash?

If sourmash reports that a particular strain or genome is present in a metagenome, how do you retrieve the reads using sourmash?

The short answer is: you have to use a different tool. You can do read mapping between the metagenome and the relevant reference genome (which can be automated with the genome-grist workflow; or, if you are interested in retrieving accessory elements, you can try out spacegraphcats.