jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
286 stars 50 forks source link

Extract certain reads belonging to certain taxa. #246

Closed luzhang321 closed 7 months ago

luzhang321 commented 7 months ago

Hi :)

I ran kraken2.1.2 & bracken for my mice gut microbiome shotgun metagenomic analysis. Since there are some non-bacteria reads in the bracken output. I would like to keep only the bacteria reads as my interest for the downstream analysis.

I found out that kraken2 provides a tool called extract_kraken_reads.py. But I didn't find similar tools in Bracken. So I was wondering whether it is okay that I extracted the bacteria reads based on the kraken2 result. Or are there any similar tools in Bracken doing similar work?

The extracting code I used for extracting bacteria reads based on kraken2 output. KrakenTools-1.2/extract_kraken_reads.py -k $filename".output.txt" -s1 $filename".new_kneaddata_paired_1.fastq" -s2 $filename".new_kneaddata_paired_2.fastq" -o bacteria_reads/$filename"_1.bac.fasta" -o2 bacteria_reads/$filename"_2.bac.fasta" -t 2 --include-children -r /sbidata/projects/lzhang/2022_mice/Output/Kneaddata_custom_database_output/$filename".report.txt"

Thanks so much!

jenniferlu717 commented 7 months ago

I am the author for KrakenTools (including extract_kraken_reads.py) and Bracken but you cannot extract reads from Bracken output. Bracken estimates read counts but does not reassign individual reads. The best option would just to be using extract_kraken_reads as you have already run the program.