Closed eduardszoecs closed 10 years ago
Right, good idea. I gave a shot on the taxontree branch [here](https://github.com/ropensci/taxize/blob/taxon_tree/R/taxontree.R)
But it only works for three species, and I haven't tested it more, but just a quick first stab at it. It's a lot harder problem than I thought it would be. You can't just assume similarity in taxon names means similarity in relationships among species like you could do with machine learning for text classification.
You could probably write something better!
@EDiLD Any progress on this? Do you think we'll do this eventually? Or should we close this issue?
If you can build classification into tabular form, then you can build trees. WIth tabular form I mean something where rows would be the leaf taxa, and columns would give their memberships at various levels of classification. We have this in vegan where we have function taxa2dist
to cast that kind of classification into distances and then these can be turned into a tree. The taxa2dist
function is based on Clarke & Warwick, Marine Ecol Prog Ser 184, 21-29 (1999).
@jarioksa Awesome, thanks for pointing out taxa2dist
. This example I think works:
library(taxize)
library(vegan)
spnames <- c('Klattia flava', 'Trollius sibiricus', 'Arachis paraguariensis',
'Tanacetum boreale', 'Gentiana yakushimensis','Sesamum schinzianum',
'Pilea verrucosa','Tibouchina striphnocalyx','Lycium dasystemum',
'Centrosema latidens','Schoenus centralis','Berkheya echinacea',
'Androcymbium villosum','Helianthus annuus','Madia elegans',
'Lupinus albicaulis','Poa annua')
out <- classification(spnames, db='ncbi')
foo <- function(x){
df <- x[x$rank %in% c('kingdom','order','family','genus'), 'name']
names(df) <- x[x$rank %in% c('kingdom','order','family','genus'), 'rank']
t(data.frame(df))
}
output <- ldply(out, foo)
taxdis <- taxa2dist(output, varstep=TRUE)
plot(hclust(taxdis), hang = -1)
Oh man... I'm using vegan
for some years now - but newer stumbled across this function.
Thanks Jari for the pointer!
Thanks Scott for the quick example! What should we do? I mean classification() in combination with taxa2dist is quite useful - but should it go into taxize? I am not sure...
The initial idea was based on this paper:
Guénard G, Ohe PC von der, de Zwart D, Legendre P, Lek S. Using phylogenetic information to predict species tolerances to toxic chemicals. Ecological Applications. 2011;21(8):3178–90.
If still the plot-method for classification() on my desk (which resembles the lower part of figure 4 therein) - but don't know when I'll finish this. Sorry, too much other stuff todo at the moment :(
@EDiLD Do you prefer a plot to look like that in Fig. 4 from that paper? Or rather have a traditional dendrogram object or even a ape::phylo
object?
I think a function like this could be useful in taxize
Sorry, these were independent issues.
I though the lowerpart of Fig.4 to be returned by plot.classification()
.
The stuff here in this issue shouldn't be plot.classification()
- rather something like tree.classifiation
? Or dist.classification()
and than add a plot-method?
ape has as.phylo.hclust
for casting dendrogram to a phylo
object. It is possible to build this more directly, too, but of course it needs a bit work (and some of that pretty boring work).
@jarioksa Thanks for pointing out as.phylo.hclust
. I'll look into using that to cast to a phylo object.
@EDiLD Okay, makes sense: have a tree.classification
method, then we could go to a plot.phylo
out <- classification(specieslist)
tr <- tree(out) # gives ape phylo object
plot(tr) # plots tree
Sounds like a good idea!
Started fxn in these two commits f7177ff7eae1a3d3c74e7f24e7b7bd9c4b3a7355 and 358f81c162c8c67b8e15128318beabcd538d0699
@EDiLD and @jarioksa See gist here for example of new function: https://gist.github.com/sckott/8603217
Do suggest or make any changes you think necessary.
Tahnke, Looks good,
about the docs: maybe give credit to jari (taxa2dist) and bob clarke. And a little description whats going one there.
Also, I am not sure about the assertthat dependency - why this is needed: 1) One more dependency, that we must take care of. 2) It's easily implemented with base R.
Yes, I will give credit, and describe more what is going on.
I like asssertthat
because it gives nice error messages for the user, but I can take it out
Changes in commit 2745a6c55c3c9eae033ce6f97ff823c9450d2fdc
this is done, closing
Using classification(), we could make up a function to contruct species trees from taxonomy. Sure they may not reflect phylogeny and user should be cautious, but when there is no molecular information available...