Open hettling opened 9 years ago
Reason are the parameters CLADE_MIN_DENSITY
and CLADE_MIN_COVERAGE
:
For the species Trachypithecus auratus
there are many alignments, but just one made the cut to be included in the clade during decomposition. Since CLADE_MIN_COVERAGE
was 2, the species did not end up in the markers table of that clade.
We should emit a warning about that at the right moment, maybe during bbdecompose
, keep track of the alignment count for each species within the clade while iterating over the alignments (alns_for_taxa
) and warn if there are less than CLADE_MIN_COVERAGE
Commit d3756aab791fa6f2c132a162276a198e23a78cb8 adresses this partly: Now we warn if a taxon is in less clade alignments than CLADE_MIN_COVERAGE
. However, there is still the possibility that
a species is in enough alignments which are then merged together in smrt-clademerge
and that species then won't make the cut. This is the case for exemplar Trachypithecus auratus
.
In the primates example, we have two clades,
Trachypithecus
andLepilemur
that are discarded when grafting onto the backbone, since their clade trees only have one exemplar.The exemplars are most likely excluded during
clademerge
when we build a graph connecting species by their respective markers and choosing the largest connected subset of species.