simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

VirSorter

Source code of the VirSorter App, available on CyVerse (https://de.iplantcollaborative.org/de/)

NOTE - VirSorter 2 is now available for early access/testing -> see https://github.com/jiarong/VirSorter2

Publication

Result files

The main output files of VirSorter are:

VirSorter results can be imported into Anvi'o by following these instructions.

VirSorter results can also be imported into R by following these instructions.

Using a conda virtual environment (tested on Ubuntu and CentOS)

To run VirSorter, type the following:

source activate virsorter
wrapper_phage_contigs_sorter_iPlant.pl -f assembly.fasta --db 1 --wdir output_directory --ncpu 4 --data-dir /path/to/virsorter-data

Note for Conda installation

If error: "ListUtil.c: loadable library and perl binaries are mismatched", this is a known conda issue, that can be fixed with the following steps: Create a file etc/conda/activate.d/update_perllib.sh in your conda environment folder including the following lines:

#!/bin/sh
export OLD_PERL5LIB=$PERL5LIB
export PERL5LIB=`pwd`/../../../lib/site_perl/5.26.2/

Then create a file etc/conda/deactivate.d/update_perllib.sh in your conda environment folder including the following lines:

#!/bin/sh
export PERL5LIB=$OLD_PERL5LIB

Note: you may have to create the folders "etc", "etc/conda", "etc/conda/activate.d/", and "etc/conda/deactivate.d/" (e.g. using mkdir) in your conda environment folder, as these are not always generated by default in every conda environment.

Note for Diamond

For users trying to use the "diamond" option, please verify that you download the latest version of the "data-dir" package (wget https://zenodo.org/record/1168727/files/virsorter-data-v2.tar.gz), as previous versions of the database did not include the diamond database file.

Docker - from DockerHub (v1.0.5)

After "virsorter:v1.0.5", the options correspond to the ones described in wrapper_phage_contigs_sorter_iPlant.pl (here selecting the database "Viromes" and pointing VirSorter to the file "Input_contigs.fna").
* You can specify a userID to be the owner of the files that will be created by VirSorter by using the --user option of Docker, e.g.
$ docker run --user `id -u` -v /host/path/to/virsorter-data:/data -v /host/path/to/virsorter-run:/wdir -w /wdir --rm simroux/virsorter:v1.0.5 --db 2 --fna Input_contigs.fna


# Docker - from DockerHub (v1.0.3)

* Download the databases required by VirSorter, available as a tarball archive on iMicrobe: http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/imicrobe/VirSorter/virsorter-data.tar.gz
or /iplant/home/shared/imicrobe/VirSorter/virsorter-data.tar.gz through iPlant Discovery Environment
* Untar this package in a directory, e.g. /host/path/to/virsorter-data
* Pull VirSorter from dockerhub: $ docker pull discoenv/virsorter:v1.0.3
* Create a working directory for VirSorter which includes the input fasta file, e.g. /host/path/to/virsorter-run
* Then run VirSorter from docker, mounting the data directory as data, and the run directory as wdir:

    $ docker run -v /host/path/to/virsorter-data:/data -v /host/path/to/virsorter-run:/wdir -w /wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna /wdir/Input_contigs.fna

After "virsorter:v1.0.3", the options correspond to the ones described in wrapper_phage_contigs_sorter_iPlant.pl (here selecting the database "Viromes" and pointing VirSorter to the file "Input_contigs.fna").

# Docker - building packages from scratch

## Dependencies

Install the following into a "bin" directory:

* HMMER (http://hmmer.janelia.org/)
* MCL (http://micans.org/mcl/)
* Metagene Annotator (http://metagene.nig.ac.jp/metagene/download_mga.html)
* MUSCLE (http://www.drive5.com/muscle/)
* BLAST+ (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)
* DIAMOND (https://github.com/bbuchfink/diamond)

## Data Container

The 12G of dependent data exists as a separate data container 
called "virsorter-data."

This is the Dockerfile for that:

    FROM perl:latest

    MAINTAINER Ken Youens-Clark <kyclark@email.arizona.edu>

    COPY Generic_ref_file.refs /data/

    COPY PFAM_27 /data/PFAM_27

    COPY Phage_gene_catalog /data/Phage_gene_catalog

    COPY Phage_gene_catalog_plus_viromes /data/Phage_gene_catalog_plus_viromes

    COPY VirSorter_Readme.txt /data

    COPY VirSorter_Readme_viromes.txt /data

    VOLUME ["/data"]

Then do:

    $ docker build -t kyclark/virsorter-data .
    $ docker create --name virsorter-data kyclark/virsorter-data /bin/true

## Build

    $ docker build -t kyclark/virsorter .

## Run

A sample "run" command to use the current working directory for input/output:

    $ docker run --rm --volumes-from virsorter-data -v $(pwd):/de-app-work \
    -w /de-app-work kyclark/virsorter --fna Mic_1.fna --db 1

# Authors

Simon Roux <sroux@lbl.gov> is the author of Virsorter

Ken Youens-Clark <kyclark@email.arizona.edu> packaged this for Docker/iPlant.

Bryan D Merrill <bmerrill@stanford.edu> provided the improvements and additions for v1.0.5