jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
293 stars 50 forks source link

Reduce memory overhead and exit early if not able to create database.… #245

Closed ch4rr0 closed 8 months ago

ch4rr0 commented 9 months ago

This PR seeks to address two issues:

1. Reduce memory overhead during the creation of database.kraken

It seeks to avoid any potential memory overhead incurred by having the find command concatenate the FASTAs and input that data to the classifier via process substitution.

2. Exit early if failed to build database.kraken

The change-set alters the behavior of the build script to finalize the name of the database only if the build process has successfully completed. This avoids potential false positives when running bracken-build in a loop as demonstrated below:

for i in {50,75,100}; do
    bracken-build -d . -t 12 -l $i
done

Currently, if bracken-build fails for read length 50 the loop will still continue since database.kraken exists regardless of whether the previous invocation succeeded or not. This will have the cascading effect of read length 75 and 100 being processed on a potentially truncated database.