grunwaldlab / metacoder

Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
http://grunwaldlab.github.io/metacoder_documentation
Other
133 stars 28 forks source link

how to get access to the Taxmap dataframe ? #358

Open mariabernard opened 8 months ago

mariabernard commented 8 months ago

Hi everyone,

I am trying to use metacoder to visualise differential abundance taxonomies that I have identify outside metacoder.

I create a metacoder object with the full taxonomy abundance table. And now to reduce it I try to select only taxon that I identified as differentially abundant.

I generate a table which contains :

 #   tax_rank taxon_names      input_index
 #   <chr>    <chr>                    <int>
 # 1 p        Firmicutes_A                 4
 # 2 p        Firmicutes_C                 6
 # 3 p        Firmicutes_B               527
 # 4 p        Firmicutes_G               646
 # 5 p        Bacillota_A                 21

My idea was to merge it with I guess the TaxMap that is printed when we print the metacoder object. Something like :

taxon_id  input_index   tax_rank tax_name regex_match
<chr>     <int>         <chr>     <chr>   <chr>
aat 53  p   Actinobacteriota    p__Actinobacteriota
aat 53  c   Actinomycetia   c__Actinomycetia
aat 53  o   Actinomycetales o__Actinomycetales

Then I will be able to extract taxon_id, and filter the metacoder object. The idea is to exclude taxon such as s__unclassified_species that can be observed multiple times, and include only those that appear at the specified input_index (is it clear ?).

I tried get_data_frame() but it returned an error Error in obj$get_data_frame(...) : variables not of equal length

Is there any solution ?

Kind regards

Maria

zachary-foster commented 8 months ago

Just to clarify, your goal is to filter out taxa based on their name in another table? If so, this should be easy enough as long as the taxon names match:

library(metacoder)
#> This is metacoder version 0.3.6 (stable)

x = parse_tax_data(hmp_otus, class_cols = "lineage", class_sep = ";",
                   class_key = c(tax_rank = "taxon_rank", tax_name = "taxon_name"),
                   class_regex = "^(.+)__(.+)$")

the_best_taxa_ever <- c("Actinobacteria", "Collinsella", "Geodermatophilus")

filter_taxa(x, taxon_names %in% the_best_taxa_ever)
#> <Taxmap>
#>   4 taxa: ae. Actinobacteria ... gq. Collinsella
#>   4 edges: NA->ae, ae->am, am->gp, am->gq
#>   2 data sets:
#>     tax_data:
#>       # A tibble: 185 × 53
#>         taxon_id otu_id    lineage `700035949` `700097855` `700100489`
#>         <chr>    <chr>     <chr>         <int>       <int>       <int>
#>       1 am       OTU_97.4… r__Roo…           8          36          10
#>       2 am       OTU_97.4… r__Roo…          42         277          16
#>       3 am       OTU_97.3… r__Roo…          11           2           0
#>       # ℹ 182 more rows
#>       # ℹ 47 more variables: `700111314` <int>, `700033744` <int>,
#>       #   `700109581` <int>, `700111044` <int>, `700101365` <int>,
#>       #   `700100431` <int>, `700016050` <int>, `700032425` <int>,
#>       #   `700024855` <int>, `700103488` <int>, …
#>     class_data:
#>       # A tibble: 924 × 5
#>         taxon_id input_index tax_rank tax_name        regex_match     
#>         <chr>          <int> <chr>    <chr>           <chr>           
#>       1 ae                 4 p        Actinobacteria  p__Actinobacter…
#>       2 am                 4 c        Actinobacteria  c__Actinobacter…
#>       3 am                 4 o        Actinomycetales o__Actinomyceta…
#>       # ℹ 921 more rows
#>   0 functions:

Created on 2023-11-02 with reprex v2.0.2

Note that filtering the tables in my_taxmap$data will not remove taxa, just the information associated with them (e.g., rows in tables). The only easy way to remove taxa (and possibly data assigned to them depending on the options used) is to use filter_taxa