rvosa / bio-phylo

Bio::Phylo - Phyloinformatic analysis using Perl
http://search.cpan.org/dist/Bio-Phylo
GNU General Public License v3.0
16 stars 6 forks source link

Newick tree simplification #1

Closed fangly closed 12 years ago

fangly commented 12 years ago

Hi Rutger,

As I hinted previously, here is some code I wrote to simplify a Newick tree before parsing it. Given a list of terminal node IDs to keep, this code will process all cherries and recursively remove the terminal nodes that are not needed.

I tried some more complex, more thorough code before, but it was quite slow. However, processing only the cherries is very satisfactory in terms of performance. For example, given 200 input node IDs the large Greengenes tree that never finished parsing was simplified to ~900 nodes. The entire process took less than 1.5 minutes.

Cheers,

Florent