Closed tseemann closed 9 years ago
It's what I was more familiar with. I wasn't sure that upgrading to blast+ would be a major performance increase, but I may be wrong.
BLAST+ is far more efficient when multi-threading (-a / -num_threads), but it does depend on the number of queries and the size of the database.
If you need advice modifying the command line this 1-pager I wrote summarizes most of it: http://www.vicbioinformatics.com/documents/Quick_Start_Guide_BLAST_to_BLAST+.pdf
BLAST hasn't been supported for >2 years now. BLAST+ has some good features like customisable TSV output formats which might be useful for LS-BSR.
The reason I bring up this and vsearch is I would like to package LS-BSR into Homebrew.
I've looked at transfer_annotation.py
and you would probably gain speed by using BLAST+ as you are using the -a threads option.
I know you don't really need help with this but I'll include it here for others to benefit from. The main gotcha is that NCBI changed the output format codes!
OLD:
blastall -p blastp -i %s -d query.peptides.xyx -m 8 -o xyx.blast.out.xyx -a %s
NEW:
blastp -query %s -db query.peptides.xyx -outfmt 6 -out xyx.blast.out.xyx -num_threads %s
You may want to add -seg no
(was -F F
) to disable low-complexity filtering. Also if you know your proteins are all from same genus etc your should use "BLOSUM80" for greater sensitivity.
Yeah, I need to re-work this entire script, but can switch over to blast+ when I do
Blast+ is now supported and blastall is deprecated. Let me know if you run into any issues. Updates to manual coming soon.
Thanks @jasonsahl !
I notice you need the old legacy blastall 2.2.x instead of the BLAST+ package.
Is there any reason for this?