maclandrol / profileNJ

Gene tree correction using species tree and NJ
GNU General Public License v3.0
9 stars 2 forks source link

Build Status PyPI version

profileNJ

Utility package for gene tree correction and reconciliation with a species tree, written in python and based on the ete3 toolkit.

Installation

Dependencies

profileNJ depends on the following package:

PyQt4 can be tricky to install, I recommand using either your distribution version or install it with conda (conda install -c anaconda pyqt=4).

To install profileNJ, download the package on github and install with :

python setup.py install

or use pip : pip install profileNJ

You may need sudo privileges. You can also install a local version by using the '--user' flag.

Minimum Documentation

profileNJ

profileNJ correct genetree by contracting weak branches and resolving them to have binary trees with a minimum reconciliation cost to their specietree. profileNJ use NJ in order to keep sequence information as much as possible an can output multiple solutions. If the input tree is considered unrooted, profileNJ can test every possible root and return the binary tree with the lowest duplication-lost or all rooted binary tree. A detailed description of the algorithm can be found in our paper :

    Noutahi E, Semeria M, Lafond M, Seguin J, Boussau B, et al. (2016) Efficient Gene Tree Correction Guided by Genome Evolution. PLoS ONE 11(8): e0159559. doi: 10.1371/journal.pone.0159559

File formats

see [polytomy-solver-distance] (https://github.com/UdeM-LBIT/polytomy-solver-distance#file-formats)

reconcile

reconcile compute and output the reconcilied gene tree, and it's cost, between a binary genetree and a binary species tree. Two mode are possible : in the run mode, you can compute the reconcilied gene tree, whereas in the smap mode, reconcile return an automatic map between the genes in the gene trees and the species.

optional arguments:

polytomySolver

polytomySolver is a new algorithm for resolving gene trees with polytomies in linear time. polytomySolver support both unit and weighted duplication and loss cost. It's an improved version of the quadratic algorithm described by Lafond and al. in 2012 (M. Lafond, K.M. Swenson, and N. El-Mabrouk. An optimal reconciliation algorithm for gene trees with polytomies. In LNCS, volume 7534 of WABI, pages 106-122, 2012.), using the compressed species tree idea of Zheng and Zhang (Y. Zheng and L. Zhang. Reconciliation with non-binary gene trees revisited. In Lecture Notes in Computer Science, volume 8394, pages 418-432, 2014. Proceedings of RECOMB.). polytomySolver is faster than Notung and thus can be used on large trees.

Reusable modules

TreeClass

Bases from the TreeNode class of the ete package, TreeClass is a tree representation class. A tree consists of a collection of TreeClass instances connected in a hierarchical way. A TreeClass object can be loaded from the New Hampshire Newick format (newick). TreeClass add specific functions for tree processing not present in ete's TreeNode.

run pydoc for minimum documentation.

TreeUtils

TreeUtils offer several static functions related to phylogeny tree. With You can fetch ensembl genetree and reconcile a gene tree to its species tree.

ClusterUtils

ClusterUtils is an implementation of UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor-Joining), two clustering distance-based method for tree construction.

NCBI_tree_of_life

see SPECIES_TREE

Script to reconstruct the tree of life using the ncbi taxonomy. The current newick file (tree.nw) is obtained with the latest ncbi taxonomy release.

How to cite.

If you use profileNJ or polytomySolver, please cite the following papers: