sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
323 stars 189 forks source link
bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing

Roary - The pan genome pipeline

Takes annotated assemblies in GFF3 format and calculates the pan genome.

PLEASE NOTE: we currently do not have the resources to provide support for Roary, so please do not expect a reply if you flag any issue.

Unmaintained
Build Status
License: GPL v3
status
install with bioconda
Container ready
Docker Build Status
Docker Pulls
codecov

Contents

Introduction

Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka) and calculates the pan genome. Using a standard desktop PC, it can analyse datasets with thousands of samples, something which is computationally infeasible with existing methods, without compromising the quality of the results. 128 samples can be analysed in under 1 hour using 1 GB of RAM and a single processor. To perform this analysis using existing methods would take weeks and hundreds of GB of RAM.

Installation

Roary has the following dependencies:

Required dependencies

Optional dependencies

There are a number of ways to install Roary and details are provided below. If you encounter an issue when installing Roary please contact your local system administrator.

Ubuntu/Debian

Debian Testing

sudo apt-get install roary

Ubuntu 14.04/16.04

All the dependancies can be installed using apt and cpanm. Root permissions are required. Ubuntu 16.04 contains a package for Roary but it is frozen at v3.6.0.

sudo apt-get install bedtools cd-hit ncbi-blast+ mcl parallel cpanminus prank mafft fasttree
sudo cpanm -f Bio::Roary

Ubuntu 12.04

Some of the software versions in apt are quite old so follow the instructions for Bioconda below.

Bioconda - OSX/Linux

Install conda. Then install bioconda and roary:

conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install roary

Galaxy

Roary is available from the Galaxy toolshed (as is Prokka).

GNU Guix

Roary is included in Guix and can be installed in the usual way:

guix package --install roary

Virtual Machine - OSX/Linux/Windows

Roary wont run natively on Windows but we have created virtual machine which has all of the software setup, including Prokka, along with the test datasets from the paper. It is based on Bio-Linux 8. You need to first install VirtualBox, then load the virtual machine, using the 'File -> Import Appliance' menu option. The root password is 'manager'.

ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova

More importantly though, if you're trying to do bioinformatics on Windows, you're not going to get very far and you should seriously consider upgrading to Linux.

Docker - OSX/Linux/Windows/Cloud

We have a docker container which gets automatically built from the latest version of Roary in Debian Med. To install it:

docker pull sangerpathogens/roary

To use it you would use a command such as this (substituting in your directories), where your GFF files are assumed to be stored in /home/ubuntu/data:

docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/roary roary -f /data /data/*.gff

Installing from source (advanced Linux users only)

As a last resort you can install everything from source. This is for users with advanced Linux skills and we do not provide any support with this method since you have the skills to figure things out. Download the latest software from (https://github.com/sanger-pathogens/Roary/tarball/master).

Choose somewhere to put it, for example in your home directory (no root access required):

cd $HOME
tar zxvf sanger-pathogens-Roary-xxxxxx.tar.gz
ls Roary-*

Add the following lines to your $HOME/.bashrc file, or to /etc/profile.d/roary.sh to make it available to all users:

export PATH=$PATH:$HOME/Roary-x.x.x/bin
export PERL5LIB=$PERL5LIB:$HOME/Roary-x.x.x/lib

Install the Perl dependencies:

sudo cpanm  Array::Utils Bio::Perl Exception::Class File::Basename File::Copy File::Find::Rule File::Grep File::Path File::Slurper File::Spec File::Temp File::Which FindBin Getopt::Long Graph Graph::Writer::Dot List::Util Log::Log4perl Moose Moose::Role Text::CSV PerlIO::utf8_strict Devel::OverloadInfo Digest::MD5::File

Install the external dependances either from source or from your packaging system:

bedtools cd-hit blast mcl GNUparallel prank mafft fasttree

Ancient systems and versions of perl

The code will not work with perl 5.8 or below (pre-modern perl). We no longer test against 5.10 (released 2007) or 5.12 (released 2010). If you're running a very old verison of Linux, you're also in trouble.

Running the tests

The test can be run with dzil from the top level directory:

dzil test

Versions of software we test against

Usage

Usage:   roary [options] *.gff

Options: -p INT    number of threads [1]
         -o STR    clusters output filename [clustered_proteins]
         -f STR    output directory [.]
         -e        create a multiFASTA alignment of core genes using PRANK
         -n        fast core gene alignment with MAFFT, use with -e
         -i        minimum percentage identity for blastp [95]
         -cd FLOAT percentage of isolates a gene must be in to be core [99]
         -qc       generate QC report with Kraken
         -k STR    path to Kraken database for QC, use with -qc
         -a        check dependancies and print versions
         -b STR    blastp executable [blastp]
         -c STR    mcl executable [mcl]
         -d STR    mcxdeblast executable [mcxdeblast]
         -g INT    maximum number of clusters [50000]
         -m STR    makeblastdb executable [makeblastdb]
         -r        create R plots, requires R and ggplot2
         -s        dont split paralogs
         -t INT    translation table [11]
         -ap       allow paralogs in core alignment
         -z        dont delete intermediate files
         -v        verbose output to STDOUT
         -w        print version and exit
         -y        add gene inference information to spreadsheet, doesnt work with -e
         -iv STR   Change the MCL inflation value [1.5]
         -h        this help message

Example: Quickly generate a core gene alignment using 8 threads
         roary -e --mafft -p 8 *.gff

For further info see: http://sanger-pathogens.github.io/Roary/

For further instructions on how to use the software, the input format and output formats, please see the Roary website.

License

Roary is free software, licensed under GPLv3.

Feedback/Issues

We currently do not have the resources to provide support for Roary. However, the community might be able to help you out if you report any issues about usage of the software to the issues page.

Citation

If you use this software please cite:

"Roary: Rapid large-scale prokaryote pan genome analysis",
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
Bioinformatics, (2015). doi: http://dx.doi.org/10.1093/bioinformatics/btv421

Roary: Rapid large-scale prokaryote pan genome analysis

Further Information

For more information on this software see: