Closed grabear closed 5 years ago
@sdhutchins Anything to add here?
@grabear I think you're covering it perfectly.
Should be noted this is using the GitHub version of metacoder. I can recreate the error as well with the same object.
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 rstudioapi_0.8 ggtree_1.14.0 ggthemes_4.0.1 ggpubr_0.1.8 magrittr_1.5
[7] knitr_1.20 vegan_2.5-3 lattice_0.20-35 permute_0.9-4 microbiome_1.4.0 extrafont_0.17
[13] RColorBrewer_1.1-2 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.7 purrr_0.2.5 readr_1.1.1
[19] tidyr_0.8.2 tibble_1.4.2 tidyverse_1.2.1 plyr_1.8.4 plotly_4.8.0 ampvis2_2.4.1
[25] ggplot2_3.1.0 modes_0.7.0 diptest_0.75-7 metacoder_0.3.0.9001 taxa_0.3.1.9001 phyloseq_1.26.0
[31] workflowr_1.1.1
I'm noticing several things, after a bunch of testing:
taxa::filter_taxa
do not properly filter any of the tables in taxmap_obj$data
a. Doing filter_taxa
first removes taxa from the classification
, but keeps it in the observation
data. This is what's causing the error after trying to use filter_taxa %>% filter_obs
. Essentially, the length(observation_data)
> length(taxa_data)
:
Error: All logical filtering criteria must be the same length as the data sets filtered.'
metacoder_obj %>% filter_taxa(n_supertaxa < 2) <Taxmap> 14 taxa: ab. Bacteria, ad. Firmicutes, ae. Proteobacteria ... ap. Planctomycetes, aq. Tenericutes 14 edges: NA->ab, ab->ad, ab->ae, ab->af, ab->ag, ab->ah, ab->ai ... ab->ak, ab->al, ab->am, ab->an, ab->ap, ab->aq 6 data sets: diff_table: # A tibble: 400 x 14 taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff mean_diff mean_treat1 mean_treat2 ... # skip most datasets for saving space. The key here is that we went from 400 taxa to 14 taxa in our object # The diff_table was not filtered properly (at 400 taxa), but the taxonomy was (at 14 taxa)
```r
metacoder_obj %>% filter_obs(c("diff_table", "tax_table"), n_supertaxa < 2)
<Taxmap>
400 taxa: ab. Bacteria, ad. Firmicutes, ae. Proteobacteria ... rs. uncultured bacterium, rt. uncultured organism
400 edges: NA->ab, ab->ad, ab->ae, ab->af, ab->ag, ab->ah, ab->ai ... jq->rl, jr->ro, jr->rp, js->rr, jt->rs, hf->rt
6 data sets:
tax_table:
# A tibble: 14 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21 Sample_22
...
diff_table:
# A tibble: 14 x 14
taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff mean_diff mean_treat1 mean_treat2
...
# Observation data was filtered properly. Taxa data cannot be manipulated appropriately this way
...
# Piping in %>% filter_taxa(n_supertaxa < 2), to the previous code block gives:
<Taxmap>
14 taxa: ab. Bacteria, ad. Firmicutes, ae. Proteobacteria ... ap. Planctomycetes, aq. Tenericutes
14 edges: NA->ab, ab->ad, ab->ae, ab->af, ab->ag, ab->ah, ab->ai ... ab->ak, ab->al, ab->am, ab->an, ab->ap, ab->aq
6 data sets:
tax_table:
# A tibble: 14 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21 Sample_22
diff_table:
# A tibble: 14 x 14
taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff mean_diff mean_treat1 mean_treat2
...
# Using filter_obs with the original data (which cosists of >1600 taxa vs 400 taxa) will throw the same error we've been getting
> metacoder_object %>% filter_obs(c("otu_table"), n_supertaxa < 2)
Error: All logical filtering criteria must be the same length as the data sets filtered.
...
# This is why I think we need to swith how we are piping our functions.
filter_obs %>% filter_taxa
instead. That way we can filter the observation data using the full taxonomy heirarchy, and then filter the taxa down afterward to the same level of the observation data.I feel like this is some kind of bug. It may be of note that we used @zachary-foster's metacoder package along with metacoder::parse_phyloseq' to get our data into a taxmap object. Not sure if this is relevant or not.. Either way I'm going to continue our analysis, using
filter_obs %>% filter_taxa` until further help is given here @sdhutchins.
This is also true in other filtering situations (235 taxa vs 408 taxa):
> filter_ambiguous_taxa(xx)
<Taxmap>
235 taxa: ab. Bacteria, ac. Archaea ... qq. Campylobacter fetus subsp. fetus, qs. Helicobacter macacae
235 edges: NA->ab, NA->ac, ab->ad, ab->ae, ab->af, ab->ag, ab->ah ... ib->os, id->oy, id->oz, il->pk, jd->qq, je->qs
6 data sets:
otu_table:
# A tibble: 1,665 x 50
taxon_id otu_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21
tax_data:
# A tibble: 1,665 x 8
taxon_id Kingdom Phylum Class Order Family Genus Species
tax_table:
# A tibble: 408 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21 Sample_22.
diff_table:
# A tibble: 408 x 14
taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff mean_diff mean_treat1 mean_treat2
0 functions:
Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs.
2: The data set "4" is named, but not named by taxon ids.
This means I have to do something with a for loop to remove ambiguous taxonomies, which is nowhere near as thorough as taxa::filter_amb_taxa
.
for (amb_term in ambiguous_filter) {
filtered_meta <- filtered_meta %>%
taxa::filter_obs(c("diff_table", "tax_table"), !stringr::str_detect(taxon_names, amb_term)) %>%
taxa::filter_taxa(!stringr::str_detect(taxon_names, amb_term))
}
...
# Using filter_obs with the original data (which cosists of 1665 taxa vs 400 taxa) will throw the same error we've been getting
> metacoder_object %>% filter_obs(c("otu_table"), n_supertaxa < 2)
Error: All logical filtering criteria must be the same length as the data sets filtered.
...
# This is why I think we need to swith how we are piping our functions.
Below I've included a situation where the above :point_up: statement isn't true. When using a different filtering condition, all of the tables are filtered ("otu_table", "tax_data" are normally at 1665 taxa after filtering, but here they are filtered to 1662 taxa).
> xx
<Taxmap>
408 taxa: ab. Bacteria, ac. Archaea, ad. Firmicutes ... rs. uncultured bacterium, rt. uncultured organism
408 edges: NA->ab, NA->ac, ab->ad, ab->ae, ab->af, ab->ag, ab->ah ... jq->rl, jr->ro, jr->rp, js->rr, jt->rs, hf->rt
6 data sets:
otu_table:
# A tibble: 1,665 x 50
taxon_id otu_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21
tax_data:
# A tibble: 1,665 x 8
taxon_id Kingdom Phylum Class Order Family Genus Species
tax_table:
# A tibble: 408 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21 Sample_22
diff_table:
# A tibble: 408 x 14
taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff mean_diff mean_treat1 mean_treat2
> filter_taxa(xx, taxon_names == "Archaea", subtaxa = TRUE, invert = TRUE)
<Taxmap>
400 taxa: ab. Bacteria, ad. Firmicutes, ae. Proteobacteria ... rs. uncultured bacterium, rt. uncultured organism
400 edges: NA->ab, ab->ad, ab->ae, ab->af, ab->ag, ab->ah, ab->ai ... jq->rl, jr->ro, jr->rp, js->rr, jt->rs, hf->rt
6 data sets:
otu_table:
# A tibble: 1,662 x 50
taxon_id otu_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21
tax_data:
# A tibble: 1,662 x 8
taxon_id Kingdom Phylum Class Order Family Genus Species
tax_table:
# A tibble: 400 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13 Sample_14 Sample_17 Sample_21 Sample_22
diff_table:
# A tibble: 400 x 14
taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff mean_diff mean_treat1 mean_treat2
Warning messages:
1: There is no "taxon_id" column in the data set "3", so there are no taxon IDs.
2: The data set "4" is named, but not named by taxon ids.
So there's something going on that makes filtering inconsistent, when using _taxonid vs ??regex??. I know it must have something to do with the taxon_id, but I can't put my finger on it.
Maybe this is an otu_id
vs taxon_id
thing?
Here's another weird case:
> metac
<Taxmap>
237 taxa: ab. Bacteria, ad. Firmicutes ... qs. Helicobacter macacae
237 edges: NA->ab, ab->ad, ab->ae, ab->af ... id->oz, il->pk, jd->qq, je->qs
6 data sets:
otu_table:
# A tibble: 1,662 x 50
taxon_id otu_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13
tax_data:
# A tibble: 1,662 x 8
taxon_id Kingdom Phylum Class Order Family Genus Species
tax_table:
# A tibble: 237 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13
diff_table:
# A tibble: 237 x 14
taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff
0 functions:
> metac %>% filter_taxa(taxon_ranks == "Class", reassign_obs = FALSE)
<Taxmap>
21 taxa: ar. Negativicutes, as. Clostridia ... bm. Mollicutes
21 edges: NA->ar, NA->as, NA->at, NA->au, NA->av ... NA->bi, NA->bj, NA->bl, NA->bm
6 data sets:
otu_table:
# A tibble: 3 x 50
taxon_id otu_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13
tax_data:
# A tibble: 3 x 8
taxon_id Kingdom Phylum Class Order Family Genus Species
tax_table:
# A tibble: 21 x 49
taxon_id Sample_1 Sample_2 Sample_3 Sample_6 Sample_9 Sample_10 Sample_13
diff_table:
# A tibble: 21 x 14
taxon_id treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff
0 functions:
For my analysis, I used obj$data$tax_table <- metacoder::calc_taxon_abund(obj, "otu_table", ...)
and obj$data$diff_table <- metacoder::compare_groups(obj, "tax_table", ...)
. @zachary-foster I know that the tables mentioned use intermediate taxa, and obj$data$otu_table
and obj$data$tax_data
both use leaf taxa. But shouldn't all obj$data
reflect any change made with the taxa::filter
functions?
I know this comment contains a lot of the metacoder
stuff, and I think it might belong there. But for now i'm going to keep it going in this thread.
This may be related to https://github.com/grunwaldlab/metacoder/issues/188
Hello @grabear and @sdhutchins. Thanks for all the testing! I am looking at this now.
filter_obs("diff_table", n_supertaxa < 2)
This will only make sense if "diff_table" has a one row per taxon in the same order as the taxa. You would get this if you used compare_groups
with only one comparison (two groups), but otherwise, I would expect the length difference error. The two warnings you get can probably be ignored unless you thought those table should have been filtered. Tables without a "taxon_ids" column will not be filtered.
. Doing filter_taxa first removes taxa from the classification, but keeps it in the observation data.
filter_taxa
only removes observation data (ie user-defined) if it cant find a way to preserve it and usually there is a way to preserve it.
filter_taxa
can be used to filter observation data by taxon, but it takes some non-default parameters and it gets harder to understand what is going on when multiple tables are in the object.
For example, when trying to do filter_taxa( n_supertaxa < 2)
, I would not expect the "diff_table" to loose any rows, because all of the rows would be able to be reassigned to a supertaxon. I expect the taxon ids changed, but not row count.
If the table has a one-to-one relationship with the taxa, then you can make filter_taxa
filter that table the same way it filters taxa like so:
> obj
<Taxmap>
174 taxa: ab. Root, ac. Proteobacteria, ad. Bacteroidetes ... gq. Collinsella, gr. Blautia, gs. Clostridium
174 edges: NA->ab, ab->ac, ab->ad, ab->ae, ab->af, ab->ag, ab->ah ... di->gm, dj->gn, bu->go, dk->gp, cm->gq, cf->gr, cw->gs
4 data sets:
tax_data:
# A tibble: 1,000 x 53
taxon_id otu_id lineage `700035949` `700097855` `700100489` `700111314` `700033744` `700109581` `700111044` `700101365`
<chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
1 dm OTU_9… r__Roo… 0 2 1 0 0 0 0 0
2 dn OTU_9… r__Roo… 0 0 0 0 0 0 0 0
3 do OTU_9… r__Roo… 0 1 0 0 0 0 0 0
# ... with 997 more rows, and 42 more variables: `700100431` <int>, `700016050` <int>, `700032425` <int>,
# `700024855` <int>, `700103488` <int>, `700096869` <int>, `700107379` <int>, `700096422` <int>, `700102417` <int>,
# `700114168` <int>, …
class_data:
# A tibble: 5,922 x 5
taxon_id input_index tax_rank tax_name regex_match
<chr> <int> <chr> <chr> <chr>
1 ab 1 r Root r__Root
2 ac 1 p Proteobacteria p__Proteobacteria
3 aj 1 c Gammaproteobacteria c__Gammaproteobacteria
# ... with 5,919 more rows
tax_abund:
# A tibble: 174 x 51
taxon_id `700035949` `700097855` `700100489` `700111314` `700033744` `700109581` `700111044` `700101365` `700100431`
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ab 955 4330 887 598 6303 3379 1219 829 1880
2 ac 193 144 17 132 1388 30 0 21 168
3 ad 5 43 6 21 20 15 7 5 43
# ... with 171 more rows, and 41 more variables: `700016050` <dbl>, `700032425` <dbl>, `700024855` <dbl>,
# `700103488` <dbl>, `700096869` <dbl>, `700107379` <dbl>, `700096422` <dbl>, `700102417` <dbl>, `700114168` <dbl>,
# `700037540` <dbl>, …
diff_table:
# A tibble: 174 x 7
taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ab female male 0.138 183 -11 0.648
2 ac female male 0.482 48 96.5 0.461
3 ad female male 0.211 31 180. 0.634
# ... with 171 more rows
0 functions:
> obj %>% filter_taxa(n_supertaxa < 2, drop_obs = c(diff_table = TRUE), reassign_obs = c(diff_table = FALSE))
<Taxmap>
8 taxa: ab. Root, ac. Proteobacteria, ad. Bacteroidetes ... ag. Fusobacteria, ah. Tenericutes, ai. Spirochaetes
8 edges: NA->ab, ab->ac, ab->ad, ab->ae, ab->af, ab->ag, ab->ah, ab->ai
4 data sets:
tax_data:
# A tibble: 1,000 x 53
taxon_id otu_id lineage `700035949` `700097855` `700100489` `700111314` `700033744` `700109581` `700111044` `700101365`
<chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
1 ac OTU_9… r__Roo… 0 2 1 0 0 0 0 0
2 ad OTU_9… r__Roo… 0 0 0 0 0 0 0 0
3 ad OTU_9… r__Roo… 0 1 0 0 0 0 0 0
# ... with 997 more rows, and 42 more variables: `700100431` <int>, `700016050` <int>, `700032425` <int>,
# `700024855` <int>, `700103488` <int>, `700096869` <int>, `700107379` <int>, `700096422` <int>, `700102417` <int>,
# `700114168` <int>, …
class_data:
# A tibble: 5,922 x 5
taxon_id input_index tax_rank tax_name regex_match
<chr> <int> <chr> <chr> <chr>
1 ab 1 r Root r__Root
2 ac 1 p Proteobacteria p__Proteobacteria
3 ac 1 c Gammaproteobacteria c__Gammaproteobacteria
# ... with 5,919 more rows
tax_abund:
# A tibble: 174 x 51
taxon_id `700035949` `700097855` `700100489` `700111314` `700033744` `700109581` `700111044` `700101365` `700100431`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ab 955 4330 887 598 6303 3379 1219 829 1880
2 ac 193 144 17 132 1388 30 0 21 168
3 ad 5 43 6 21 20 15 7 5 43
# ... with 171 more rows, and 41 more variables: `700016050` <dbl>, `700032425` <dbl>, `700024855` <dbl>,
# `700103488` <dbl>, `700096869` <dbl>, `700107379` <dbl>, `700096422` <dbl>, `700102417` <dbl>, `700114168` <dbl>,
# `700037540` <dbl>, …
diff_table:
# A tibble: 8 x 7
taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ab female male 0.138 183 -11 0.648
2 ac female male 0.482 48 96.5 0.461
3 ad female male 0.211 31 180. 0.634
# ... with 5 more rows
0 functions:
To do the same thing (mostly) with filter_obs
:
> obj %>% filter_obs("diff_table", n_supertaxa[obj$data$diff_table$taxon_id] < 2)
<Taxmap>
174 taxa: ab. Root, ac. Proteobacteria, ad. Bacteroidetes ... gq. Collinsella, gr. Blautia, gs. Clostridium
174 edges: NA->ab, ab->ac, ab->ad, ab->ae, ab->af, ab->ag, ab->ah ... di->gm, dj->gn, bu->go, dk->gp, cm->gq, cf->gr, cw->gs
4 data sets:
tax_data:
# A tibble: 1,000 x 53
taxon_id otu_id lineage `700035949` `700097855` `700100489` `700111314` `700033744` `700109581` `700111044` `700101365`
<chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int>
1 dm OTU_9… r__Roo… 0 2 1 0 0 0 0 0
2 dn OTU_9… r__Roo… 0 0 0 0 0 0 0 0
3 do OTU_9… r__Roo… 0 1 0 0 0 0 0 0
# ... with 997 more rows, and 42 more variables: `700100431` <int>, `700016050` <int>, `700032425` <int>,
# `700024855` <int>, `700103488` <int>, `700096869` <int>, `700107379` <int>, `700096422` <int>, `700102417` <int>,
# `700114168` <int>, …
class_data:
# A tibble: 5,922 x 5
taxon_id input_index tax_rank tax_name regex_match
<chr> <int> <chr> <chr> <chr>
1 ab 1 r Root r__Root
2 ac 1 p Proteobacteria p__Proteobacteria
3 aj 1 c Gammaproteobacteria c__Gammaproteobacteria
# ... with 5,919 more rows
tax_abund:
# A tibble: 174 x 51
taxon_id `700035949` `700097855` `700100489` `700111314` `700033744` `700109581` `700111044` `700101365` `700100431`
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ab 955 4330 887 598 6303 3379 1219 829 1880
2 ac 193 144 17 132 1388 30 0 21 168
3 ad 5 43 6 21 20 15 7 5 43
# ... with 171 more rows, and 41 more variables: `700016050` <dbl>, `700032425` <dbl>, `700024855` <dbl>,
# `700103488` <dbl>, `700096869` <dbl>, `700107379` <dbl>, `700096422` <dbl>, `700102417` <dbl>, `700114168` <dbl>,
# `700037540` <dbl>, …
diff_table:
# A tibble: 8 x 7
taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ab female male 0.138 183 -11 0.648
2 ac female male 0.482 48 96.5 0.461
3 ad female male 0.211 31 180. 0.634
# ... with 5 more rows
0 functions:
This will work even if there is not a 1-to-1 relationship with taxa.
But shouldn't all obj$data reflect any change made with the taxa::filter functions?
The short answer is that the taxa::filter functions main goal is to keep the two in sync. filter_taxa
does all is can to not filter observations by default and filter_obs
does all it can to not filter taxa by default.
filter_taxa
, by default, will only remove a row if it cant reassign it to a supertaxon that did not get filtered out (n_supertaxa > 2 will always preserve a supertaxon).
filter_obs
will only remove a taxon if it or a subtaxon no longer exist in any table.
I am not sure I understand every question brought up in this issue, but I have a feeling that there is not a bug. Things just get tricky when there are multiple tables and each table has different characteristics. If you still think there is a bug or have questions after going through this answer, then repost an example and I will try to address that example directly.
@zachary-foster Thanks for your response! I know it's a lot, but I kept running into things I didn't understand. Your comments have been very helpful. As you know, I've been having these issues all along :rofl:
I think that we mainly would like to produce observation data tables in obj$data$diff
and obj$data$tax_abund
and use the taxon_ids
to taxa::map_data
. That way we can produce other plots like the correlation plots here that accurately reflect the heat_tree
visualizations.
If the table has a one-to-one relationship with the taxa, then you can make filter_taxa filter that table the same way it filters taxa like so:
Based on your comments I believe we can achieve this, but only with a a 1:1 relationship with the taxa/obs data. If we have more than two groups, then what should we do? Split the diff table? Or split our data into multiple taxmap objects?
No problem! Filtering with multiple different tables gets confusing. I have often thought I found bugs myself until realizing that I forgot some detail of how my own code works. I try to make it the default usages of the functions as intuitive as possible, but understanding how all the options interact can be difficult.
I think I see the confusion. The output of calc_taxon_abund
and compare_groups
return per-taxon information, so you can filter them with taxon attributes, but once you filter them or the taxa, they are no longer per-taxon, so it gets more complicated. Also, if you dont use drop_*
and reassing_obs = FALSE
they will not be filtered out at all, just reassigned taxon IDs.
If we have more than two groups, then what should we do?
If you want to filter data sets that are not 1-to-1 with taxa, using taxon characteristics (n_supertaxa, ranks, taxon names, etc), then you need to map the taxon characteristic to the data before filtering. Typically this involves using the helper functions like n_supertaxa(obj)
directly and subsetting their output. Since the result of these functions is always named by taxon ID, you can use the taxon_id column of the data set of interested to "subset" them like so:
obj %>% filter_obs("diff_table", n_supertaxa[obj$data$diff_table$taxon_id] < 2)
Another way:
obj$data$diff_table <- obj$data$diff_table[n_supertaxa(obj)[obj$data$diff_table$taxon_id] < 2, ]
Another way:
obj %>%
mutate_obs(diff_n_supertaxa = n_supertaxa[obj$data$diff_table$taxon_id]) %>%
filter_obs("diff_table", diff_n_supertaxa < 2) %>% ... more filtering using diff_n_supertaxa.
Think of this like adding a column to "diff_table" that has the number of supertaxa for each entry, based on the taxon id, and then filtering on that. You could actually do that if it makes it easier to understand.
All that being said, there could be bugs in the code, so if things dont make sense, feel free to let me know.
Here is one of my solutions to get a diff table with the data that I need.
...
# Custom functions
...
## Import data as phyloseq
> phy <- get_phyloseq_obj(".RDATA")
## Preprocess data using phyloseq package
> proc_phy <- preprocess_phyloseq(phy)
## Convert phyloseq object and analyze data using metacoder
> obj <- get_metacoder_obj(proc_phy)
...
# Diff Table Solution
...
> x1 <- obj %>% filter_taxa(n_supertaxa < 2, reassign_obs = FALSE)
> dplyr::right_join(x1$taxonomy_table(subset = taxon_ids(x1), add_id_col = TRUE), x1$data$diff_table, by = c("taxon_ids" = "taxon_id"))
# A tibble: 16 x 16
taxon_ids Kingdom Phylum treatment_1 treatment_2 log2_median_rat~ log2_mean_ratio median_diff mean_diff mean_treat1 mean_treat2 wilcox_p_value hartigan_dip_tr~ hartigan_dip_tr~ bimodality_coef~
<chr> <fct> <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ab Bacter~ NA Stressed Control -0.000165 -0.00122 -0.000114 -8.44e-4 0.999 1.000 0.0170 0.996 0.554 0.879
2 ac Archaea NA Stressed Control 3.48 2.53 0.000114 8.44e-4 0.00102 0.000176 0.0200 0.996 0.554 0.879
3 ad Bacter~ Firmi~ Stressed Control 0.266 0.357 0.0737 1.01e-1 0.460 0.359 0.000170 0.501 0.169 0.473
4 ae Bacter~ Prote~ Stressed Control -0.901 -0.298 -0.00748 -6.14e-3 0.0267 0.0329 0.141 0.847 0.656 0.912
5 af Bacter~ Spiro~ Stressed Control 0.830 0.607 0.00885 1.21e-2 0.0352 0.0231 0.135 0.222 0.961 0.646
6 ag Bacter~ Verru~ Stressed Control -2.05 0.416 -0.000824 5.13e-4 0.00205 0.00154 0.00852 0.814 0.797 0.946
7 ah Bacter~ Bacte~ Stressed Control -0.180 -0.303 -0.0660 -1.09e-1 0.468 0.577 0.00355 0.377 0.699 0.598
8 ai Bacter~ Fibro~ Stressed Control 1.15 1.65 0.000466 1.65e-3 0.00242 0.000773 0.0728 0.871 0.929 0.796
9 aj Bacter~ Actin~ Stressed Control -2.67 -2.26 -0.000591 -6.42e-4 0.000169 0.000811 0.0000153 0.918 0.954 0.832
10 ak Bacter~ Cyano~ Stressed Control 2.10 1.53 0.000473 7.57e-4 0.00116 0.000402 0.0279 0.577 0.906 0.783
11 al Bacter~ Elusi~ Stressed Control 0.706 -1.71 0.000105 -2.33e-3 0.00103 0.00336 0.584 0.822 0.756 0.853
12 am Bacter~ Lenti~ Stressed Control -0.420 0.450 -0.0000118 3.67e-5 0.000137 0.000100 0.868 0.890 0.604 0.891
13 an Bacter~ Sacch~ Stressed Control 3.60 2.74 0.000736 1.27e-3 0.00149 0.000224 0.000227 0.564 0.662 0.830
14 ao Archaea Eurya~ Stressed Control 3.48 2.53 0.000114 8.44e-4 0.00102 0.000176 0.0200 0.996 0.554 0.879
15 ap Bacter~ Planc~ Stressed Control 0 2.82 0.000178 5.50e-4 0.000641 0.0000911 0.00206 0.991 0.917 0.780
16 aq Bacter~ Tener~ Stressed Control -0.0916 -0.169 -0.00000927 -5.17e-5 0.000416 0.000468 0.638 0.855 0.0624 0.804
# ... with 1 more variable: bimodality_coeff_treat2 <dbl>
Closing due to inactivity. Feel free to reopen if needed
Hey @zachary-foster !
I'm currently trying to resolve a weird issue...
I am trying to filter a taxmap object, and I keep getting this error at the bottom.
However, if I switch the order that I filter (
filter_obs %>% filter_taxa
) it works out fine.filter_taxa
on it's own also works fine. So there is some hangup withfilter_obs
when it comes afterfilter_taxa
for this use case.Do you know what's happening here??
@sdhutchins Keep up with this issue :+1: