harmslab / phylogenetics

A Python API for managing phylogenetics projects
http://phylogenetics.readthedocs.io
BSD 3-Clause "New" or "Revised" License
6 stars 12 forks source link

Project object, dataio module, and ancestral reconstruction #6

Closed Zsailer closed 8 years ago

Zsailer commented 8 years ago

This is a big PR.

First, I've added a Project object. This is the main entry-point to the phylogenetics package. It is a container object which manages each piece of a phylogenetics project. It reads many formats and downloads sequences from BLAST. It has methods for running each aspect of a reconstruction, such as aligning, building trees, and doing ASR. This is the bread and butter.

Second, I've added a dataio module. Inside this module are read and write functions for various formats found in phylogenetics. This includes, fasta, phylip, newick, json, pickle, csv, and others. Also inside this module, I've written objects that bind these read and write methods to the many data-structures found in this package, like HomologSet and Tree. The API is the same for each object and format.

Third, I've rewritten a module for doing Ancestral sequence reconstruction. This is inspired by Lazarus -- an older python package for doing ASR. This uses PAML to do all its calculations and binds the results to a nice DendroPy tree object. As a beautiful result, phylogenetics now has a simple Tree python object that can easily be traversed and analyzed at the tips and internal ancestral nodes.

Overall, this is the major PR that makes this package useful. Many bugs exist, I'm sure. BUT this should really simplify a lot of phylogenetics.

Zsailer commented 8 years ago

Merging this PR. Ready to go.