leylabmpi / Struo2

Scalable creating/updating of metagenome profiling databases
MIT License
58 stars 8 forks source link

No Bracken kmer distributions for strain level taxids #17

Closed watsonar closed 2 years ago

watsonar commented 2 years ago

Hello,

I made custom kraken2 and Bracken databases following your tutorial, where I added several genomes to the pre-built GTDB Struo2 databases (thank you for providing these, by the way!). However, when I try to run Bracken on the Kraken2 output for my metagenomes (which has many reads assigned at several taxonomic levels), I get the error "Error: no reads found. Please check your Kraken report". One curious thing I noticed in the kraken output is that there are no reads assigned to the species (S) level, only to the strain (S1) level. I also noticed that none of my strain taxids are in the database100mers.kmer_distrib files generated by Struo2. I think that Bracken is not evaluating the reads that were assigned to strains because those taxids are not in the Bracken database.

Is there a way to get kmer distributions at the strain (S1) level for a Bracken database generated with Struo2? Or, do you think my problem might be caused by something else?

Thank you so much for any insight!

nick-youngblut commented 2 years ago

Maybe you didn't include the correct taxID for the added genomes? You need to provide the species/strain level taxID for the correct taxonomy (GTDB or NCBI, depending on which you use). Could that be the source of the problem?

nick-youngblut commented 2 years ago

I'm closing this due to inactivity. Please reopen, if needed