jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
289 stars 50 forks source link

problem with using more than one level from est_abundance.py script #31

Closed balathl closed 6 years ago

balathl commented 6 years ago

python est_abundance.py -i kraken.sample1.report -k OUTPUT_database75mers.kraken_cnts.TXT -o OUTPUT_FILE_all_levels.TXT --level "D,P,G,S" usage: est_abundance.py [-h] -i INPUT -k KMER_DISTR -o OUTPUT [-l {D,P,C,O,F,G,S}] [-t THRESH] est_abundance.py: error: argument -l/--level: invalid choice: 'D,P,G,S' (choose from 'D', 'P', 'C', 'O', 'F', 'G', 'S')

Is there anyway to include more levels at same command?

Bala

jenniferlu717 commented 6 years ago

Not currently but I'm not sure what kind of results you're looking for by adding in multiple levels?

You can just run the command multiple times to get the abundance estimates at the different levels. I don't believe the command would take that much time.

balathl commented 6 years ago

image

I am aiming my final results in this format. Any comments will be useful. My code was below: (my_biotools_bracken) [bjam@c307 filtered]$ cat run.for.kraken.build.serial.sh

!/bin/bash -l

SBATCH -J krakentest

SBATCH -o output_%j.txt

SBATCH -e errors_%j.txt

SBATCH -t 3-00:00:00

SBATCH -n 1

SBATCH --nodes=1

SBATCH --cpus-per-task=16

SBATCH -p serial

SBATCH --mem=6000

source activate my_biotools_kraken

dont run again#kraken-build --standard --db /wrk/bjam/challenge/filtered/KrakenDB

run again with other samples#kraken --db KrakenDB --paired --quick --unclassified-out UNCLASSIFIED_sample01_test --classified-out CLASSIFIED_sample01_test --out-fmt paired --output stdout_sample01_test --preload --check-names --fastq-input sample01_F_filt.fastq sample01_R_filt.fastq --threads 16

run again with other samples#kraken-report --db /wrk/bjam/challenge/filtered/KrakenDB stdout_sample01 > kraken.sample1.report

perl count-kmer-abundances.pl --db /wrk/bjam/challenge/filtered/KrakenDB --read-length=75 --threads=16 stdout_sample01 > database75mers.kraken_cnts python generate_kmer_distribution.py -i database75mers.kraken_cnts -o OUTPUT_database75mers.kraken_cnts.TXT python est_abundance.py -i kraken.sample1.report -k OUTPUT_database75mers.kraken_cnts.TXT -o OUTPUT_FILE_species.TXT python est_abundance.py -i kraken.sample1.report -k OUTPUT_database75mers.kraken_cnts.TXT -o OUTPUT_FILE_genus.TXT --level G echo done

jenniferlu717 commented 6 years ago

You can just concatenate the output files from each Bracken run. They will give you the taxid, rank, and percentages. You would have to then generate the taxpath and the taxpathsn columns but otherwise, concatenating the output would give you most of what you want.

balathl commented 6 years ago

ok. Thanks! This was the first time i am using bracken. great tool :)

Could you suggest any improvements in the below commands??

dont run again#kraken-build --standard --db /wrk/bjam/challenge/filtered/KrakenDB

run again with other samples#kraken --db KrakenDB --paired --quick --unclassified-out UNCLASSIFIED_sample01_test --classified-out CLASSIFIED_sample01_test --out-fmt paired --output stdout_sample01 --preload --check-names --fastq-input sample01_F_filt.fastq sample01_R_filt.fastq --threads 16

run again with other samples#kraken-report --db /wrk/bjam/challenge/filtered/KrakenDB stdout_sample01 > kraken.sample1.report

perl count-kmer-abundances.pl --db /wrk/bjam/challenge/filtered/KrakenDB --read-length=75 --threads=16 stdout_sample01 > database75mers.kraken_cnts python generate_kmer_distribution.py -i database75mers.kraken_cnts -o OUTPUT_database75mers.kraken_cnts.TXT python est_abundance.py -i kraken.sample1.report -k OUTPUT_database75mers.kraken_cnts.TXT -o OUTPUT_FILE_species.TXT python est_abundance.py -i kraken.sample1.report -k OUTPUT_database75mers.kraken_cnts.TXT -o OUTPUT_FILE_genus.TXT --level G echo done

My fastq reads were 151 basepairs. I have 20 fastq files from 20 samples to analyse.

Thanks in advance!

Bala

jenniferlu717 commented 6 years ago

I think the commands look fine from my end.