simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Add parallelization and additional user options #24

Closed brymerr921 closed 6 years ago

brymerr921 commented 6 years ago

Unless the user specifies to use Diamond instead of blastp, the changes made here should not affect the VirSorter predictions at all. All changes except for the ability to run Diamond instead of blastp simply make VirSorter run faster and more efficiently, and to expand user options. Changes implemented include:

  1. Pass through the number of CPUs specified at the commandline (in wrapper...) to Step_0, Step_3, and Step_first scripts.
  2. Parallelize muscle and hmmbuild steps in Step_0 and Step_first
  3. Parallelize a major part of Step_3 that iterates through each contig and highlights the signal.
  4. Instead of using file read/writes to pass information from Step_3 to the sliding window analysis, this is done by piping the data directly into the C script. (Thanks Simon for this one!)
  5. Diamond is added as an option at the wrapper command line to use instead of blastp for extremely large input fasta files or for inputting many custom phages.
  6. Add option to wrapper to keep databases after adding custom phages.
  7. Update Perl dependencies to include Parallel::ForkManager and List::MoreUtils.
  8. Update README.md to include easy conda installation instructions and modified usage instructions.

I propose these changes bump the release to 1.0.4, again clarifying that none of these changes (except for running Diamond) will affect output results but are just improvements in speed and giving the user more options.