ropensci / taxa

taxonomic classes for R
https://docs.ropensci.org/taxa
Other
48 stars 12 forks source link

Phylogeny in a taxmap? #184

Open sckott opened 6 years ago

sckott commented 6 years ago

@zachary-foster Curious if you’ve ever tried to handle a phylogenetic tree inside of a taxmap? working on https://github.com/ropensci/phylodiv and thinking of using taxa inside the pkg to help users do “phylogenetic queries” - at least taxonomic queries are possible with taxmap objects

One problem I've run into in testing out phylogenies in a taxmap is that a phylogeny converted to a data.frame of nodes and edges via the tidytree package has nodes in the data.frame. There's no matching info the taxmap object for nodes, only for tips of the tree. So some of the functionality doesn't work because there's no matches from the names in the taxmap to the nodes in the phylogeny.

I guess one could add all the node names of the phylogeny to the names in the taxmap - and where names aren't known just use node numbers or so?

maybe you've already played with this concept and I'm missing it?

zachary-foster commented 6 years ago

Hi Scott,

There are two ways I can imagine doing it:

You can store the phylogeny as a taxonomy, with each node in the taxonomy being a "taxon" and store things like edge length as data associated with nodes. For example:

library(metacoder)
> parse_newick(text = "(ant:17, (bat:31, cow:22):7, dog:22, (elk:33, fox:12):40);
 (dog:20, (elephant:30, horse:60):20):50;")
<Taxmap>
  14 taxa: b. node_10, e. node_11, i. node_12 ... l. node_7, n. node_8, o. node_9
  14 edges: NA->b, NA->c, b->d, b->e, e->f, e->g ... i->j, i->k, c->l, c->m, m->n, m->o
  1 data sets:
    tax_data:
      # A tibble: 14 x 3
        taxon_id edge_length tip_label
        <chr>          <dbl> <chr>    
      1 b                NA  NA       
      2 c                NA  NA       
      3 d                17. ant      
      # ... with 11 more rows
  0 functions:

or

library(ape)
library(metacoder)
data(bird.orders) # a phylo object
parse_phylo(bird.orders)
<Taxmap>
  45 taxa: b. node_24, c. node_25, d. node_26 ... br. node_21, bs. node_22, bt. node_23
  45 edges: NA->b, b->c, c->d, d->e, d->f ... bo->bp, bo->bq, bq->br, bq->bs, bn->bt
  1 data sets:
    tax_data:
      # A tibble: 45 x 3
        taxon_id edge_length tip_label
        <chr>          <dbl> <chr>    
      1 b              NA    NA       
      2 c               2.10 NA       
      3 d               4.10 NA       
      # ... with 42 more rows
  0 functions:

The other way would be to use the taxonomy normally and try to put a phylogeny inside the data list of the taxmap object and check that subsetting works. If object used to store the phylogeny can be subset like a list, it might work already, but otherwise I could add some functionality to taxa to subset it correctly. You would need to have the tips (or nodes potentially) named by taxon IDs, which is not ideal since it is not really taxa that are on the phylogeny, but samples/observations, and you might need to use the tip labels for sample names. I do something similar here, but without subsetting working:

library(metacoder)

# Install phyloseq to get example data
# source('http://bioconductor.org/biocLite.R')
# biocLite('phyloseq')

# Parse example dataset
library(phyloseq)
data(GlobalPatterns)
x <- parse_phyloseq(GlobalPatterns)
> x
<Taxmap>
  2692 taxa: aab. Archaea, aac. Bacteria ... hkk. Jonquetellaanthropi
  2692 edges: NA->aab, NA->aac, aab->aad, aab->aae ... dzf->hjy, dzn->hkg, dzr->hkk
  4 data sets:
    otu_table:
      # A tibble: 19,216 x 28
        taxon_id otu_id   CL3   CC1   SV1 M31Fcsw M11Fcsw M31Plmr M11Plmr F21Plmr M31Tong
        <chr>    <chr>  <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
      1 acs      549322    0.    0.    0.      0.      0.      0.      0.      0.      0.
      2 acs      522457    0.    0.    0.      0.      0.      0.      0.      0.      0.
      3 dzw      951       0.    0.    0.      0.      0.      0.      1.      0.      0.
      # ... with 1.921e+04 more rows, and 17 more variables: M11Tong <dbl>,
      #   LMEpi24M <dbl>, SLEpi20M <dbl>, AQC1cm <dbl>, AQC4cm <dbl>, AQC7cm <dbl>,
      #   NP2 <dbl>, NP3 <dbl>, NP5 <dbl>, TRRsed1 <dbl>, …
    tax_data:
      # A tibble: 19,216 x 8
        taxon_id Kingdom Phylum        Class        Order        Family  Genus  Species  
        <chr>    <chr>   <chr>         <chr>        <chr>        <chr>   <chr>  <chr>    
      1 acs      Archaea Crenarchaeota Thermoprotei NA           NA      NA     NA       
      2 acs      Archaea Crenarchaeota Thermoprotei NA           NA      NA     NA       
      3 dzw      Archaea Crenarchaeota Thermoprotei Sulfolobales Sulfol… Sulfo… Sulfolob…
      # ... with 1.921e+04 more rows
    sample_data:
      # A tibble: 26 x 8
        sample_id X.SampleID Primer  Final_Barcode Barcode_truncated_p… Barcode_full_len…
        <chr>     <chr>      <chr>   <chr>         <chr>                <chr>            
      1 CL3       CL3        ILBC_01 AACGCA        TGCGTT               CTAGCGTGCGT      
      2 CC1       CC1        ILBC_02 AACTCG        CGAGTT               CATCGACGAGT      
      3 SV1       SV1        ILBC_03 AACTGT        ACAGTT               GTACGCACAGT      
      # ... with 23 more rows, and 2 more variables: SampleType <chr>, Description <chr>
    phy_tree:

      Phylogenetic tree with 19216 tips and 19215 internal nodes.

      Tip labels:
        549322, 522457, 951, 244423, 586076, 246140, ...
      Node labels:
        , 0.858.4, 1.000.154, 0.764.3, 0.995.2, 1.000.2, ...

      Rooted; includes branch lengths.
  0 functions:
sckott commented 6 years ago

thanks @zachary-foster - I'll investigate over the next few days and report back

sckott commented 6 years ago

@zachary-foster so we can do metacoder::parse_phylo, but can you reconstruct the phylogeny after making it a taxmap object? something like as_phylo()?

sckott commented 6 years ago

is there a way to add data after creating a taxmap? e.g., if i use metacoder::parse_phylo, i want to include higher taxonomy data, but not sure how to add a dataset to the resulting taxmap object. anyt thoughts?

zachary-foster commented 6 years ago

@zachary-foster so we can do metacoder::parse_phylo, but can you reconstruct the phylogeny after making it a taxmap object? something like as_phylo()?

Yea, although I don't have a function for it. Each split in the phylogeny is a "node" and there are the branch lengths in the table in data.

is there a way to add data after creating a taxmap? e.g., if i use metacoder::parse_phylo, i want to include higher taxonomy data, but not sure how to add a dataset to the resulting taxmap object. anyt thoughts?

It is easy to add datasets with mutate_obs, but I think you are talking about adding taxa? Like adding a "Life" taxon above a "Bacteria" taxon? That is a goal, but it is a bit hard to do now. See

https://github.com/ropensci/taxa/issues/172