Universal Genome Analyst
.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.578712.svg :target: https://doi.org/10.5281/zenodo.578712
Universal Genome Analyst (uga) is an open, flexible, and efficient tool for the distribution, management, and visualization of whole genome data analyses.
It is designed to assist biomedical researchers in complex genomic data analysis through the use of a low level interface between the powerful R statistical
environment and Python, allowing for rapid integration of emerging analytical strategies. This project uses Cython
_ for a significant reduction in computation
time and researchers with access to a high performance computing cluster or with access to multiple cores will find time-saving features for parallel analysis
using a flexible, yet controlled, commandline interface.
.. _Cython
: https://pypi.python.org/pypi
This software is currently under rapid development. Updates and bug fixes are being tracked on the uga github page
_
.. _uga github page
: https://github.com/rmkoesterer/uga
Notable Features
geepack
: geeglm, R seqMeta
: singlesnpMeta, R lme4
_: lmer)seqMeta
_: burdenMeta, skatMeta, skatOMeta)qsub
_Gzip
and Bgzip
/ Tabix
_ mapped output where possible to save disc space.. geepack
: https://cran.r-project.org/web/packages/geepack/index.html
.. seqMeta
: https://cran.r-project.org/web/packages/seqMeta/index.html
.. lme4
: https://cran.r-project.org/web/packages/lme4/index.html
.. qsub
: http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html
.. Gzip
: http://www.gzip.org/
.. Bgzip
: http://www.htslib.org/
.. _Tabix
: http://www.htslib.org/
Planned For Future Releases
survival
: coxph; nlme
: lme)Locuszoom
_ softwareSnpEff
_.. survival
: https://cran.r-project.org/web/packages/survival/index.html
.. nlme
: https://cran.r-project.org/web/packages/nlme/index.html
.. _Locuszoom
: http://genome.sph.umich.edu/wiki/LocusZoom_Standalone
.. _SnpEff
: http://snpeff.sourceforge.net/
Since parallel computing is sometimes unreliable, analysts need to be able to verify and possibly rerun failed jobs at various stages of the analysis. In the interest of user efficiency and to avoid limitations induced by excessive automation, uga breaks the analytical process into the following modules.
Installation
This software uses a variety of Python modules, R packages, and some stand-alone software. Thus, the easiest method for installation is to use one of two platforms of the
software conda
; either Anaconda
or Miniconda
_.
.. conda
: https://conda.io/docs/download.html
.. Anaconda
: https://www.continuum.io/downloads
.. _Miniconda
: https://conda.io/miniconda.html
Also, consolidation and compression of data and results files requires tabix/bgzip
and gzip
.
.. tabix/bgzip
: http://www.htslib.org/
.. gzip
: http://www.gzip.org/
To prepare your system for uga, you need to clone an environment
. You will need the included environment.yml file from the source code and a number of
packages from my anaconda cloud channel
and other custom channels (listed in the environment.yml file). After downloading the most recent
release (available here
_), use the following commands to begin the installation.
.. clone an environment
: http://conda.pydata.org/docs/using/envs.html#clone-an-environment
.. my anaconda cloud channel
: https://conda.anaconda.org/rmkoesterer
.. _here
: https://github.com/rmkoesterer/uga/releases
For the sake of this tutorial, let's assume the release version is 'X'.
tar -xvf uga-X.tar.gz cd uga-X
At this point you may change the name of the environment to anything you'd prefer by modifying the first line of the environment.yml file. For these instructions, we will assume the name is unchanged from 'uga'.
conda env create -f env/environment.yml source activate uga
With the environment active, you can now install the R packages using the included bash script, or see the comments in the script to install specific versions that have been used for development. Please note that compiling these packages can take a fairly long time (~ 20 minutes on my machine).
env/install_r_packages.sh
Now that your environment is activated and the necessary R packages have been installed, you are ready to install uga from an official release.
python setup.py install
Cutting Edge Install
Keeping up with the most current changes may be of interest to you as I will likely continue to add features and fix bugs on a regular basis. Thus, you may want to run a fork
of this repository rather than installing from source. See a tutorial describing how to fork this repository
_.
.. _fork this repository
: https://help.github.com/articles/fork-a-repo/
Getting Started
If you install uga under a conda environment, you need to source the environment as shown above before running any task in uga.
source activate uga
Verify that uga is functional using the following command to display help.
uga -h
Note: further help is provided after selecting a specific module, ie.
uga snv -h
Parallel computing
While you may simply run uga on a single cpu system, if you have access to a parallel computing cluster or even a single multiple core
processor, you will be able to take advantage of the self-managed parallel mode of use for which this software was designed.
This release was tested on a system which deploys Sun Grid Engine and qsub
_ for job management and will likely be compatible
with other PBS systems.
.. _qsub
: http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html
Please cite this software as follows. A manuscript is in the works and yet to be submitted.
Koesterer, Ryan. Universal Genome Analyst (uga). https://github.com/rmkoesterer/uga. DOI: 10.5281/zenodo.578712.
Ryan Koesterer
_.. _Ryan Koesterer
: https://github.com/rmkoesterer/uga
Please report any bugs or issues using the Issues
_ tab on this page. I will respond to all concerns as quickly as possible.
.. _Issues
: https://github.com/rmkoesterer/uga/issues
Universal Genome Analyst (uga) is distributed under the GNU General Public License v3:
Copyright (c) 2015 Ryan Koesterer
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/