peterjc / galaxy_blast

Galaxy wrappers for NCBI BLAST+ and related BLAST tools.
76 stars 70 forks source link

NCBI BLAST 2.15.0 makes parallelism extension redundant #160

Open peterjc opened 1 year ago

peterjc commented 1 year ago

Quoting the NCBI mailing list:

We have included two exciting new features in the latest (2.15.0) BLAST release. One will run searches faster for you. The other allows you to limit your search more easily by organism.

Let’s talk about how this version of BLAST runs faster for some cases. If you run BLAST with multiple threads (using multiple CPUs), there are two ways that BLAST can divide the work up among the threads. Which method works better depends upon how large the database is, which program you are running and whether you have a lot of queries to run or not. It’s all kind of complicated, but BLAST can now figure that out for you. Picking the right threading model can easily speed up a search with a smallish database (say Swissprot) and a lot of queries by a factor of 2 to 10 without changing your results, which is what this change does. You can read more about this feature and the two BLAST threading models here.

i.e. We can remove this (non-default functionality):

    <xml name="parallelism">
        <!-- If job splitting is enabled, break up the query file into parts -->
        <parallelism method="multi" split_inputs="query" split_mode="to_size" split_size="1000" merge_outputs="output1" />
    </xml>