Just so that we have a bit of an overview of this refactoring:
The basic issue is that for the clade-level analyses we currently resort to the non-superclustered alignments (from smrt align) and that we therefore get sets of alignments where the same locus occurs multiple times, in a staggered/non-overlapping manner, in the combined data set for a given clade. (For example, two alignments for the same region of cytochrome B end up "side by side" in the data set for Rhinopithecus/Presbytis.) On the other hand, we can't simply use the list of clustered alignments instead, because those clusters are merged on the basis of a different average divergence threshold than what might be suitable for clade-level analyses.
To address this, roughly the following steps should be taken:
the clustering logic from smrt orthologize needs to be moved out of the App::Cmd class into a service class
this new method needs to be able to be parameterized so that different values for the divergence threshold can be passed in (rather than obtaining this from the $config object)
this new method needs to be able to be parameterized to accept a "file stem" name, so that the clusters for the backbone (now: cladeXXX.fa) and the one for the clade clusters are named differently.
Just so that we have a bit of an overview of this refactoring:
The basic issue is that for the clade-level analyses we currently resort to the non-superclustered alignments (from
smrt align
) and that we therefore get sets of alignments where the same locus occurs multiple times, in a staggered/non-overlapping manner, in the combined data set for a given clade. (For example, two alignments for the same region of cytochrome B end up "side by side" in the data set for Rhinopithecus/Presbytis.) On the other hand, we can't simply use the list of clustered alignments instead, because those clusters are merged on the basis of a different average divergence threshold than what might be suitable for clade-level analyses.To address this, roughly the following steps should be taken:
smrt orthologize
needs to be moved out of the App::Cmd class into a service class$config
object)cladeXXX.fa
) and the one for the clade clusters are named differently.