traitecoevo / phyndr

Match tip and trait data
Other
9 stars 0 forks source link

trivial cases with big trees run very slowly #22

Open wcornwell opened 9 years ago

wcornwell commented 9 years ago
sp <- c("Pinus_nigra", "Quercus_agrifolia", "Poa_annua","Vanilla_planiflolia", "Aa_fiebrigii")
tre <- read.tree("PhylogeneticResources//Vascular_Plants_rooted.dated.tre")
angio_spp <- unique(c(tre$tip.label, sp))
angio_tax <- lookup_table(angio_spp, by_species=TRUE)
angio_tax <- angio_tax[,c("genus", "family", "order")]
angio_phyndr <- phyndr_taxonomy(tre, sp, angio_tax)

Seems like might be a way to limit the scope of the tree traversal to speed things up?

wcornwell commented 9 years ago

If there are species in the tree that are in orders that have no data, can't we just drop all those tips at the beginning?

mwpennell commented 9 years ago

this is an interesting problem. while i certainly see your point and your suggestion of dropping tips early in the tree would work well here, i am not sure there is a general heuristic that would allow us to make this call without going through the algorithm. (indeed, if this was the case, then it would suggest that we reconsider the phyndr algorithms from scratch...).

could be wrong tho

wcornwell commented 9 years ago

What about we find the MRCA of the orders (or whatever the highest level is in the taxonomy) and then drop the unmatched tips that are not in that subtree?

mwpennell commented 9 years ago

this is a pretty good idea. we should do this. (though of course it only works for the taxonomy case; in the topology case, the deepest possible swap is a node up from the root)

wcornwell commented 9 years ago

ping @richfitz ?

richfitz commented 9 years ago

I'll look at this next week