Open bradfordcondon opened 5 years ago
Interesting and surprising. According to this github issue thread, the behavior is pervasive, including the e-value cutoff. I propose we go ahead and switch to Diamond, which is a lot faster, and from the manual: "--max-target-seqs/-k # The maximum number of target sequences per query to report alignments for (default=25). Setting this to 0 will report all alignments that were found."
So set the --max_target_seqs to 0 and leave everything else alone, then we'll have to write a python script to filter the (giant) xml.
On Wed, Sep 26, 2018 at 8:34 AM Bradford Condon notifications@github.com wrote:
Basically this parameter does not return the top hits, but rather the first hits that meet whatever cutoff you set.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statonlab/hardwoods_site/issues/409, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfA2lcOvC-_9xzep665GDKcrZjse2hGks5ue3RdgaJpZM4W6jt0 .
-- Margaret Staton Assistant Professor Department of Entomology and Plant Pathology 370 PBB, 2505 EJ Chapman Drive Knoxville, TN 37996-4560
864-506-4515 Mobile mstaton1@utk.edu
I've downloaded Diamond on the Staton server, and I'm running it to see how long it will take.
A few initial observations I've had:
git clone
and the wget
option listed in Diamond's manual, but installing it via conda
produced no such issues. Given how the ACF is about conda environments, I'm going to give these earlier options another try when I install them there.I'll update this as I make more observations, including how different Diamond's XML format is from BLAST's.
https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty833/5106166
Basically this parameter does not return the top hits, but rather the first hits that meet whatever cutoff you set.