ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
268 stars 60 forks source link

Another Idea #30

Closed eduardszoecs closed 10 years ago

eduardszoecs commented 12 years ago

Using classification(), we could make up a function to contruct species trees from taxonomy. Sure they may not reflect phylogeny and user should be cautious, but when there is no molecular information available...

sckott commented 12 years ago

Right, good idea. I gave a shot on the taxontree branch [here](https://github.com/ropensci/taxize/blob/taxon_tree/R/taxontree.R)

But it only works for three species, and I haven't tested it more, but just a quick first stab at it. It's a lot harder problem than I thought it would be. You can't just assume similarity in taxon names means similarity in relationships among species like you could do with machine learning for text classification.

You could probably write something better!

sckott commented 10 years ago

@EDiLD Any progress on this? Do you think we'll do this eventually? Or should we close this issue?

jarioksa commented 10 years ago

If you can build classification into tabular form, then you can build trees. WIth tabular form I mean something where rows would be the leaf taxa, and columns would give their memberships at various levels of classification. We have this in vegan where we have function taxa2dist to cast that kind of classification into distances and then these can be turned into a tree. The taxa2dist function is based on Clarke & Warwick, Marine Ecol Prog Ser 184, 21-29 (1999).

sckott commented 10 years ago

@jarioksa Awesome, thanks for pointing out taxa2dist. This example I think works:

library(taxize)
library(vegan)
spnames <- c('Klattia flava', 'Trollius sibiricus', 'Arachis paraguariensis', 
             'Tanacetum boreale', 'Gentiana yakushimensis','Sesamum schinzianum',
             'Pilea verrucosa','Tibouchina striphnocalyx','Lycium dasystemum',
             'Centrosema latidens','Schoenus centralis','Berkheya echinacea',
             'Androcymbium villosum','Helianthus annuus','Madia elegans',
             'Lupinus albicaulis','Poa annua')
out <- classification(spnames, db='ncbi')
foo <- function(x){
  df <- x[x$rank %in% c('kingdom','order','family','genus'), 'name']
  names(df) <- x[x$rank %in% c('kingdom','order','family','genus'), 'rank']
  t(data.frame(df))
}
output <- ldply(out, foo)

taxdis <- taxa2dist(output, varstep=TRUE)
plot(hclust(taxdis), hang = -1)

eduardszoecs commented 10 years ago

Oh man... I'm using vegan for some years now - but newer stumbled across this function. Thanks Jari for the pointer!

Thanks Scott for the quick example! What should we do? I mean classification() in combination with taxa2dist is quite useful - but should it go into taxize? I am not sure...

The initial idea was based on this paper:

Guénard G, Ohe PC von der, de Zwart D, Legendre P, Lek S. Using phylogenetic information to predict species tolerances to toxic chemicals. Ecological Applications. 2011;21(8):3178–90.

If still the plot-method for classification() on my desk (which resembles the lower part of figure 4 therein) - but don't know when I'll finish this. Sorry, too much other stuff todo at the moment :(

sckott commented 10 years ago

@EDiLD Do you prefer a plot to look like that in Fig. 4 from that paper? Or rather have a traditional dendrogram object or even a ape::phylo object?

I think a function like this could be useful in taxize

eduardszoecs commented 10 years ago

Sorry, these were independent issues. I though the lowerpart of Fig.4 to be returned by plot.classification().

The stuff here in this issue shouldn't be plot.classification() - rather something like tree.classifiation? Or dist.classification() and than add a plot-method?

jarioksa commented 10 years ago

ape has as.phylo.hclust for casting dendrogram to a phylo object. It is possible to build this more directly, too, but of course it needs a bit work (and some of that pretty boring work).

sckott commented 10 years ago

@jarioksa Thanks for pointing out as.phylo.hclust. I'll look into using that to cast to a phylo object.

@EDiLD Okay, makes sense: have a tree.classification method, then we could go to a plot.phylo

out <- classification(specieslist)
tr <- tree(out) # gives ape phylo object
plot(tr) # plots tree
eduardszoecs commented 10 years ago

Sounds like a good idea!

sckott commented 10 years ago

Started fxn in these two commits f7177ff7eae1a3d3c74e7f24e7b7bd9c4b3a7355 and 358f81c162c8c67b8e15128318beabcd538d0699

sckott commented 10 years ago

@EDiLD and @jarioksa See gist here for example of new function: https://gist.github.com/sckott/8603217

Do suggest or make any changes you think necessary.

eduardszoecs commented 10 years ago

Tahnke, Looks good,

about the docs: maybe give credit to jari (taxa2dist) and bob clarke. And a little description whats going one there.

Also, I am not sure about the assertthat dependency - why this is needed: 1) One more dependency, that we must take care of. 2) It's easily implemented with base R.

sckott commented 10 years ago

Yes, I will give credit, and describe more what is going on.

I like asssertthat because it gives nice error messages for the user, but I can take it out

sckott commented 10 years ago

Changes in commit 2745a6c55c3c9eae033ce6f97ff823c9450d2fdc

sckott commented 10 years ago

this is done, closing