univieCUBE / deepnog

Protein orthologous group assignment with deep learning
BSD 3-Clause "New" or "Revised" License
26 stars 8 forks source link
deep-learning machine-learning orthology-assignments orthology-inference protein-sequences

Linux/macOS builds on Actions Windows builds on AppVeyor codecov Language grade: Python Documentation Status PyPI version Anaconda-Server Badge PyPI - Python Version

DeepNOG: protein orthologous groups assignment

Assign proteins to orthologous groups (eggNOG 5) on CPUs or GPUs with deep networks. DeepNOG is much faster than alignment-based methods, providing accuracy similar to HMMER.

Installation guide

The easiest way to install DeepNOG is to obtain it from PyPI:

pip install deepnog

Alternatively, you can clone or download bleeding edge versions from GitHub and run

pip install /path/to/DeepNOG

If you plan to extend DeepNOG as a developer, run

pip install -e /path/to/DeepNOG

instead.

deepnog can also be installed from bioconda like this:

conda install deepnog

Usage

Call the deepnog command line tool with a protein sequence file in FASTA format. Example usages:

The individual models for OG predictions are not stored on GitHub or PyPI, because they exceed file size limitations (up to 200M). deepnog automatically downloads the models, and puts them into a cache directory (default ~/deepnog_data/). You can change this directory by setting the DEEPNOG_DATA environment variable.

For help and advanced options, call deepnog --help, and deepnog infer --help or deepnog train --help for specific options for inference or training, respectively. See also the user & developer guide.

File formats supported

Preferred: FASTA (raw, .gz, or .xz)

DeepNOG supports protein sequences stored in all file formats listed in https://biopython.org/wiki/SeqIO, but is tested for the FASTA-file format only.

Databases currently supported

Deep network architectures currently supported

Required packages

deepnog builds upon the following packages:

See also requirements/*.txt for platform-specific recommendations (sometimes, specific versions might be required due to platform-specific bugs in the deepnog requirements)

Acknowledgements

This research is supported by the Austrian Science Fund (FWF): P27703, P31988; and by the GPU grant program of Nvidia corporation.

Citation

If you use DeepNOG, please consider citing our research article (click here for bibtex):

Roman Feldbauer, Lukas Gosch, Lukas Lüftinger, Patrick Hyden, Arthur Flexer, Thomas Rattei, DeepNOG: Fast and accurate protein orthologous group assignment, Bioinformatics, 2020, btaa1051, https://doi.org/10.1093/bioinformatics/btaa1051