Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
# Kraken 2 analysis
kraken_cmd = (f"kraken2 --db {kraken_db} --threads 20 "
f"--confidence 0.01 " # Adjust the number of confidence betweem 0 and 1 as needed
f"--memory-mapping "
f"--output {output_dir}/{base_name}.kraken2.out "
f"--report {output_dir}/{base_name}.kraken2.report "
f"{os.path.join(fastq_dir, fastq_file)}")
os.system(kraken_cmd)
print(f"Kraken 2 analysis completed for {base_name}")
# Bracken analysis for different levels
for level in ['G', 'F', 'S']: # Genus, Family, Species
bracken_cmd = (f"bracken -d {bracken_db} "
f"-i {output_dir}/{base_name}.kraken2.report "
f"-o {output_dir}/{base_name}.bracken_{level}.tsv "
f"-r 300 -l {level}")
os.system(bracken_cmd)
print(f"Bracken {level} level analysis completed for {base_name}")
Main script
if name == "main":
os.makedirs(output_dir, exist_ok=True)
fastq_files = [f for f in os.listdir(fastq_dir) if f.endswith(".fastq")]
# Using ThreadPoolExecutor with a specified number of workers
with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
executor.map(process_fastq, fastq_files)
print("All FASTQ files processed.")`
Here's the output:
Kraken 2 analysis completed for 10962
Bracken G level analysis completed for 10962
Bracken F level analysis completed for 10962
Bracken S level analysis completed for 10962
sh: 1: bracken: Permission denied
sh: 1: bracken: Permission denied
sh: 1: bracken: Permission denied
Loading database information... done.
30234 sequences (45.20 Mbp) processed in 10793.484s (0.2 Kseq/m, 0.25 Mbp/m).
30234 sequences classified (100.00%)
0 sequences unclassified (0.00%)
`import os import concurrent.futures
Specify the directory containing your FASTQ files
fastq_dir="/mnt/Huanglab/Data/2024-000019/5_trimmed"
Specify the Kraken 2 database you want to use
kraken_db="/mnt/localdatabase/k2_nt"
kraken_db="/media/huanglab/Transcend/k2_nt"
Specify the output directory for Kraken 2 results
output_dir="/mnt/Huanglab/Data/2024-000019/kraken"
Specify the Bracken database you want to use
bracken_db ="/mnt/localdatabase/k2_nt"
bracken_db="/media/huanglab/Transcend/k2_nt"
Create the output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)
Set the number of workers (parallel processes)
num_workers = 3 # Adjust this number based on your system's capabilities
Function to process each FASTQ file
def process_fastq(fastq_file): base_name = os.path.splitext(fastq_file)[0]
Main script
if name == "main": os.makedirs(output_dir, exist_ok=True) fastq_files = [f for f in os.listdir(fastq_dir) if f.endswith(".fastq")]
Here's the output: Kraken 2 analysis completed for 10962 Bracken G level analysis completed for 10962 Bracken F level analysis completed for 10962 Bracken S level analysis completed for 10962 sh: 1: bracken: Permission denied sh: 1: bracken: Permission denied sh: 1: bracken: Permission denied Loading database information... done. 30234 sequences (45.20 Mbp) processed in 10793.484s (0.2 Kseq/m, 0.25 Mbp/m). 30234 sequences classified (100.00%) 0 sequences unclassified (0.00%)
I am wondering what happened to my bracken?