jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
286 stars 50 forks source link

Bracken failed to execute after a period of installation and also few execution #281

Open cypheroJ opened 1 week ago

cypheroJ commented 1 week ago

`import os import concurrent.futures

Specify the directory containing your FASTQ files

fastq_dir="/mnt/Huanglab/Data/2024-000019/5_trimmed"

Specify the Kraken 2 database you want to use

kraken_db="/mnt/localdatabase/k2_nt"

kraken_db="/media/huanglab/Transcend/k2_nt"

Specify the output directory for Kraken 2 results

output_dir="/mnt/Huanglab/Data/2024-000019/kraken"

Specify the Bracken database you want to use

bracken_db ="/mnt/localdatabase/k2_nt"

bracken_db="/media/huanglab/Transcend/k2_nt"

Create the output directory if it doesn't exist

os.makedirs(output_dir, exist_ok=True)

Set the number of workers (parallel processes)

num_workers = 3 # Adjust this number based on your system's capabilities

Function to process each FASTQ file

def process_fastq(fastq_file): base_name = os.path.splitext(fastq_file)[0]

# Kraken 2 analysis
kraken_cmd = (f"kraken2 --db {kraken_db} --threads 20 " 
              f"--confidence 0.01 "        # Adjust the number of confidence betweem 0 and 1 as needed
              f"--memory-mapping "
              f"--output {output_dir}/{base_name}.kraken2.out "
              f"--report {output_dir}/{base_name}.kraken2.report "
              f"{os.path.join(fastq_dir, fastq_file)}")
os.system(kraken_cmd)
print(f"Kraken 2 analysis completed for {base_name}")

# Bracken analysis for different levels
for level in ['G', 'F', 'S']:  # Genus, Family, Species
    bracken_cmd = (f"bracken -d {bracken_db} "
                   f"-i {output_dir}/{base_name}.kraken2.report "
                   f"-o {output_dir}/{base_name}.bracken_{level}.tsv "
                   f"-r 300 -l {level}")
    os.system(bracken_cmd)
    print(f"Bracken {level} level analysis completed for {base_name}")

Main script

if name == "main": os.makedirs(output_dir, exist_ok=True) fastq_files = [f for f in os.listdir(fastq_dir) if f.endswith(".fastq")]

# Using ThreadPoolExecutor with a specified number of workers
with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:
    executor.map(process_fastq, fastq_files)

print("All FASTQ files processed.")`

Here's the output: Kraken 2 analysis completed for 10962 Bracken G level analysis completed for 10962 Bracken F level analysis completed for 10962 Bracken S level analysis completed for 10962 sh: 1: bracken: Permission denied sh: 1: bracken: Permission denied sh: 1: bracken: Permission denied Loading database information... done. 30234 sequences (45.20 Mbp) processed in 10793.484s (0.2 Kseq/m, 0.25 Mbp/m). 30234 sequences classified (100.00%) 0 sequences unclassified (0.00%)

I am wondering what happened to my bracken?