naturalis / supersmart

Self-Updating Platform for the Estimation of Rates of Speciation, Migration And Relationships of Taxa
MIT License
17 stars 5 forks source link

Refactoring superclustering for clade analysis #75

Closed rvosa closed 9 years ago

rvosa commented 9 years ago

Just so that we have a bit of an overview of this refactoring:

The basic issue is that for the clade-level analyses we currently resort to the non-superclustered alignments (from smrt align) and that we therefore get sets of alignments where the same locus occurs multiple times, in a staggered/non-overlapping manner, in the combined data set for a given clade. (For example, two alignments for the same region of cytochrome B end up "side by side" in the data set for Rhinopithecus/Presbytis.) On the other hand, we can't simply use the list of clustered alignments instead, because those clusters are merged on the basis of a different average divergence threshold than what might be suitable for clade-level analyses.

To address this, roughly the following steps should be taken:

hettling commented 9 years ago

Merging of clade alignment is now done (commit 7a178b0), testing it on primate example now.