Open ganiatgithub opened 4 years ago
This is weird since the error message you get is not part of bracken
but of bracken-build
. In fact it’s the error message I would expect when running bracken-build
with the same arguments. Did you perhaps accidentally mislink bracken-build
to bracken
?
Hi,
You're right, I got mixed up with another error message when running bracken-build
. I made sure it's the correct call this time, tested with 100k reads that has length between 1.2 to 1.6 kb, and none were distributed by bracken. Here is the log:
>> Checking for Valid Options...
>> Running Bracken
/home/Staff/uqgni1/miniconda2/envs/kraken/bin/bracken: line 98: [: trial02.bracken.report: unary operator expected
>> python src/est_abundance.py -i kk2-trial02.report -o trial02.bracken.def -k /home/Staff/uqgni1/tools/kraken2-silva138/16S_SILVA138_k2db/database100mers.kmer_distrib -l S -t 1000
PROGRAM START TIME: 07-02-2020 21:06:49
BRACKEN SUMMARY (Kraken report: kk2-trial02.report)
>>> Threshold: 1000
>>> Number of species in sample: 0
>> Number of species with reads > threshold: 0
>> Number of species with reads < threshold: 0
>>> Total reads in sample: 100000
>> Total reads kept at species level (reads > threshold): 0
>> Total reads discarded (species reads < threshold): 0
>> Reads distributed: 0
>> Reads not distributed (eg. no species above threshold): 100000
>> Unclassified reads: 0
BRACKEN OUTPUT PRODUCED: trial02.bracken.def
PROGRAM END TIME: 07-02-2020 21:06:49
Bracken complete.
I'm guess this can be due to the kmer_distrib files were only generated for 50-250? How to adjust this workflow to work with full length 16S reads?
BTW, the kraken2 report seems fine though:
head kk2-trial02.report
100.00 100000 0 R 1 root
100.00 100000 33 D 3 Bacteria
91.63 91632 2 P 2375 Proteobacteria
91.06 91055 624 C 3303 Gammaproteobacteria
58.64 58636 0 O 26732 Nitrosococcales
58.62 58625 31 F 26737 Nitrosococcaceae
58.56 58557 58557 G 26741 Candidatus Nitrosoglobus
0.03 26 26 G 26746 Nitrosococcus
0.00 4 4 G 26749 wb1-P19
0.00 3 3 G 26740 CI75cm.2.12
Kind regards, Gaofeng
What happens when you add the option -l G
to your bracken
call?
Thanks,-l G
worked. None of the reads were classified to species level, is this normal for 16S reads that are 1.2-1.6 kb long?
I notice that in the bracken-build
run, ideal read length is specified, which is not possible for my case. Will this be an issue?
Hi! In reference to your first error message (no library) in case anyone else has this issue: when you're building the kraken2 database, don't clean it (kraken2-build --clean --db silva), as this actually removes the library folder that Bracken needs to be able to run.
Now as to your second error you're getting, I'm having the same issue. My understanding is that setting the threshold at 1000 means that any species that has less than 1000 reads (in this case, most of them since the database is mainly single 16S loci) won't receive any additional reads when calculating the abundance distribution. However, even if I set it to a threshold of 1, then I don't get any results at the species level. I'd really like to get some results at the species level... I do get 1,060,046 reads kept at the class level, and 101,604 at the genus level with a threshold of 0.
Is there any way to remove the threshold?
Thanks! Heather
@heather340 is completely correct. The threshold you set is too high. The threshold (with a species level estimation) tells Bracken to ignore any species that have < 1000 reads (which I assume is all of them). I would suggest a lower threshold of maybe 10-50.
Hi Jennifer,
My apologies in advance if there is an solution already. I'm using bracken together with the pre-built silva 138 db. Command is:
Error says:
But actually the files are in this path:
Any suggestions?
Kind regards, Gaofeng