rvosa / bh15-fossil-paralogy

Repository for project group at the BioHackathon 2015
https://github.com/dbcls/bh15/wiki/Linking-fossils-and-bioinformatics
MIT License
0 stars 0 forks source link

Dealing with duplications in NHX #2

Open rvosa opened 9 years ago

rvosa commented 9 years ago

To be able to detect rate shifts following duplications we must be able to distinguish these from speciation events. The speciation events we try to place in time (by fossil calibration), the duplication events we then try to characterize in terms of substitution rates following them. To do any of this we need to parse NHX. We can do this with the newick parser by assigning a true value to the optional -ignore_comments flag, then parsing the square bracket statements that are appended to the node and tip labels.

rvosa commented 9 years ago

As of commit https://github.com/rvosa/bio-phylo/commit/6636fecd1a0c30ca0c19b7608292309c30788150, Bio::Phylo parses NHX as per TreeFAM's implementation of it.

rvosa commented 9 years ago

Mental note: nhx:D is the key/predicate that annotates nodes to specify whether they are duplications (value: T) or speciations (value: F).