Open mawa86 opened 2 years ago
Hi @mawa86
What you suggests is doable, but I don't think this will get added officially into the phyloseq
package (even if people find this useful) as phyloseq
is not actively maintained by the developer(s) anymore.
Thank you for your reply @ycl6 !
with your admirable R skills, would you be able to show me how to achieve something like that?
martin
You might be interested in the tax_transform functionality from microViz, which works with phyloseq objects and can solve your problem (disclaimer, I'm the maintainer)
library(microViz)
#> microViz version 0.10.3 - Copyright (C) 2022 David Barnett
#> ! Website: https://david-barnett.github.io/microViz
#> ✔ Useful? For citation details, run: `citation("microViz")`
#> ✖ Silence? `suppressPackageStartupMessages(library(microViz))`
# get some example phyloseq data
data("shao19")
# subset samples for a smaller dataset
psq <- shao19 %>% ps_filter(age == 38)
# inside we see the count data
otu_get(psq, taxa = 1:5, samples = 1:5)
#> OTU Table: [5 taxa and 5 samples]
#> taxa are columns
#> Escherichia coli Bacteroides caccae Bacteroides stercoris
#> B01089_mo 7371972 595532 746435
#> B01471_mo 59606 0 1986747
#> B01716_mo 0 0 0
#> B01771_mo 0 576325 3137262
#> B02098_mo 723595 21251 20985
#> Ruminococcus bromii [Eubacterium] rectale
#> B01089_mo 0 133105
#> B01471_mo 1201752 281075
#> B01716_mo 0 78437
#> B01771_mo 988615 220861
#> B02098_mo 308852 222002
# we can transform the counts in various ways, e.g. to proportions
compositional <- psq %>% tax_transform("compositional")
otu_get(compositional, taxa = 1:5, samples = 1:5)
#> OTU Table: [5 taxa and 5 samples]
#> taxa are columns
#> Escherichia coli Bacteroides caccae Bacteroides stercoris
#> B01089_mo 0.416323960 0.033632011 0.042154090
#> B01471_mo 0.002955227 0.000000000 0.098501647
#> B01716_mo 0.000000000 0.000000000 0.000000000
#> B01771_mo 0.000000000 0.029512358 0.160652408
#> B02098_mo 0.034895678 0.001024839 0.001012011
#> Ruminococcus bromii [Eubacterium] rectale
#> B01089_mo 0.00000000 0.007516958
#> B01471_mo 0.05958210 0.013935519
#> B01716_mo 0.00000000 0.003611266
#> B01771_mo 0.05062484 0.011309815
#> B02098_mo 0.01489452 0.010706141
# or to presence/absence binary indicators
presenceAbsence <- psq %>% tax_transform("binary")
otu_get(presenceAbsence, taxa = 1:5, samples = 1:5)
#> OTU Table: [5 taxa and 5 samples]
#> taxa are columns
#> Escherichia coli Bacteroides caccae Bacteroides stercoris
#> B01089_mo 1 1 1
#> B01471_mo 1 0 1
#> B01716_mo 0 0 0
#> B01771_mo 0 1 1
#> B02098_mo 1 1 1
#> Ruminococcus bromii [Eubacterium] rectale
#> B01089_mo 0 1
#> B01471_mo 1 1
#> B01716_mo 0 1
#> B01771_mo 1 1
#> B02098_mo 1 1
# for your example request, slightly more complex but possible:
presenceAbsenceGenus_0.02 <- psq %>%
tax_transform("compositional", rank = "genus") %>%
tax_transform("binary", undetected = 0.02, chain = TRUE)
otu_get(presenceAbsenceGenus_0.02, taxa = 1:5, samples = 1:5)
#> OTU Table: [5 taxa and 5 samples]
#> taxa are columns
#> Escherichia Bacteroides Ruminococcus Clostridia class Bifidobacterium
#> B01089_mo 1 1 0 0 1
#> B01471_mo 0 1 1 0 1
#> B01716_mo 0 1 0 0 1
#> B01771_mo 0 1 1 1 1
#> B02098_mo 1 1 0 0 1
# phyloseq then provides a (confusingly named) function to get the sample
# abundance vector for a named taxon, e.g. "Streptococcus")
strepPresent <- phyloseq::get_sample(presenceAbsenceGenus_0.02, "Streptococcus")
# you could add this back on to your original sample data as a T/F vector like so
psq <- psq %>% ps_mutate(Strep0.02 = as.logical(strepPresent))
phyloseq::sample_data(psq)[1:5, "Strep0.02"]
#> Strep0.02
#> B01089_mo FALSE
#> B01471_mo TRUE
#> B01716_mo FALSE
#> B01771_mo FALSE
#> B02098_mo FALSE
# for fun let's see some composition heatmaps to illustrate the transformations
psq %>% comp_heatmap(taxa = 1:25)
#> Registered S3 method overwritten by 'seriation':
#> method from
#> reorder.hclust vegan
compositional %>% comp_heatmap(taxa = 1:25)
presenceAbsence %>% comp_heatmap(taxa = 1:25, show_heatmap_legend = FALSE)
presenceAbsenceGenus_0.02 %>% comp_heatmap(taxa = 1:25, show_heatmap_legend = FALSE)
Created on 2022-11-18 with reprex v2.0.2
Hi @mawa86
Hope the solution offered by @david-barnett answers your question.
hi @david-barnett and @ycl6
Thanks very much for this tool David, looks neat!
thanks to both for the time
/Martin
Hey Joey, thanks for this incredible tool! I have to admit it might sound like a stupid question as I am not that experienced with R, but ill give it a go:
Im looking for an easy way to automatically create a simple yes/no (or positive/negative) variable in sample_data for a given taxon, which would assign a value according to a user-defined threshold value in the otu_table. for example:
estimate_presence(PhyloseqObject, variable= "Strep", rank= genus, tax= "Streptococcus", threshold= 0.02)
...returning a variable ("Strep") in sample_data where for each sample a positive/negative entry is assigned according to whether more or less than 2% of reads are assigned to the OTU corresponding to the genus Streptococcus.
Could it be of interest? and does anyone want to point me in the right direction on how to achieve something like that in R at the moment? maybe @ycl6 can help me?
Any help very appreciated! thanks Martin