Closed schifferl closed 3 years ago
If you are up for it, can you investigate @sdgamboa?
@schifferl, sure, I'll investigate about it.
@schifferl, I found a Bioconductor package named mia that has a function to convert from TreeSummarizedExperiment to phyloseq, although I think the package is still in devel version. I leave an example below.
# BiocManager::install("microbiome/mia")
packages <- c("mia", "TreeSummarizedExperiment", "phyloseq", "HMP16SData")
for (pkg in packages) suppressMessages(library(pkg, character.only = TRUE))
# A SummarziedExperiment-class object
se <- suppressMessages(V35())
se
#> class: SummarizedExperiment
#> dim: 45383 4743
#> metadata(2): experimentData phylogeneticTree
#> assays(1): 16SrRNA
#> rownames(45383): OTU_97.1 OTU_97.10 ... OTU_97.9998 OTU_97.9999
#> rowData names(7): CONSENSUS_LINEAGE SUPERKINGDOM ... FAMILY GENUS
#> colnames(4743): 700013549 700014386 ... 700114717 700114750
#> colData names(7): RSID VISITNO ... HMP_BODY_SUBSITE SRS_SAMPLE_ID
# A TreeSummarizedExperiment-class object
tse <- TreeSummarizedExperiment(
assays = list(count = assays(se)$`16SrRNA`),
rowData = rowData(se),
rowTree = metadata(se)$phylogeneticTree,
colData = colData(se),
)
#> Warning: 47 row(s) couldn't be matched to the tree and are/is removed.
tse
#> class: TreeSummarizedExperiment
#> dim: 45336 4743
#> metadata(0):
#> assays(1): count
#> rownames(45336): OTU_97.1 OTU_97.10 ... OTU_97.9998 OTU_97.9999
#> rowData names(7): CONSENSUS_LINEAGE SUPERKINGDOM ... FAMILY GENUS
#> colnames(4743): 700013549 700014386 ... 700114717 700114750
#> colData names(7): RSID VISITNO ... HMP_BODY_SUBSITE SRS_SAMPLE_ID
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (45336 rows)
#> rowTree: 1 phylo tree(s) (45364 leaves)
#> colLinks: NULL
#> colTree: NULL
# A phyloseq-class object
ps <- mia::makePhyloseqFromTreeSummarizedExperiment(tse, abund_values = "count")
ps
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 45336 taxa and 4743 samples ]
#> sample_data() Sample Data: [ 4743 samples by 7 sample variables ]
#> tax_table() Taxonomy Table: [ 45336 taxa by 5 taxonomic ranks ]
#> phy_tree() Phylogenetic Tree: [ 45336 tips and 45099 internal nodes ]
# Note: Columns that do not match "kingdom", "class", "order", "family", "genus",
# or "species" are removed from tax_table
colnames(rowData(tse))
#> [1] "CONSENSUS_LINEAGE" "SUPERKINGDOM" "PHYLUM"
#> [4] "CLASS" "ORDER" "FAMILY"
#> [7] "GENUS"
colnames(tax_table(ps))
#> [1] "PHYLUM" "CLASS" "ORDER" "FAMILY" "GENUS"
Created on 2021-05-03 by the reprex package (v2.0.0)
Nice find! I’d say if mia is being released too, you could just document this in the vignette, and mention it in the ExpressionSet2phyloseq defunct error.--
Levi Waldron
Associate Professor
Department of Epidemiology and Biostatistics
CUNY Graduate School of Public Health and Health Policy
Institute for Implementation Science in Population Health
55 W 125th St, New York NY 10035
Check that this works e.g. on the Asnicar TreeSummarizedExperiment.
@lwaldron , @schifferl , the mia::makePhyloseqFromTreeSummarizedExperiment
function seems to work for cMD even without rowData. I leave an example and the definition of the method below (the output is a bit long).
packages <- c("curatedMetagenomicData", "mia", "TreeSummarizedExperiment",
"phyloseq")
for (pkg in packages) suppressMessages(library(pkg, character.only = TRUE))
# Import data from cMD
tse <- curatedMetagenomicData("AsnicarF_2021.relative_abundance", dryrun = FALSE)
#> snapshotDate(): 2021-05-05
#> Warning: 380 row(s) couldn't be matched to the tree and are/is removed.
tse
#> $`2021-03-31.AsnicarF_2021.relative_abundance`
#> class: TreeSummarizedExperiment
#> dim: 639 1098
#> metadata(0):
#> assays(1): relative_abundance
#> rownames(639):
#> k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae|g__Bacteroides|s__Bacteroides_vulgatus
#> k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Bacteroidaceae|g__Bacteroides|s__Bacteroides_stercoris
#> ...
#> k__Bacteria|p__Synergistetes|c__Synergistia|o__Synergistales|f__Synergistaceae|g__Pyramidobacter|s__Pyramidobacter_sp_C12_8
#> k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Brevibacteriaceae|g__Brevibacterium|s__Brevibacterium_aurantiacum
#> rowData names(0):
#> colnames(1098): SAMEA7041133 SAMEA7041134 ... SAMEA7045952 SAMEA7045953
#> colData names(20): studyName subjectID ... curator NCBI_accession
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> rowLinks: a LinkDataFrame (639 rows)
#> rowTree: 1 phylo tree(s) (10430 leaves)
#> colLinks: NULL
#> colTree: NULL
# Convert to phyloseq-class object
ps <- makePhyloseqFromTreeSummarizedExperiment(tse[[1]], abund_value = "relative_abundance")
ps
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 639 taxa and 1098 samples ]
#> sample_data() Sample Data: [ 1098 samples by 20 sample variables ]
#> phy_tree() Phylogenetic Tree: [ 639 tips and 638 internal nodes ]
# How data looks like
otu_table(ps) |>
rownames() |>
head()
#> [1] "k__Archaea|p__Euryarchaeota|c__Thermoplasmata|o__Methanomassiliicoccales|f__Methanomassiliicoccaceae|g__Methanomassiliicoccus|s__Candidatus_Methanomassiliicoccus_intestinalis"
#> [2] "k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanosphaera|s__Methanosphaera_cuniculi"
#> [3] "k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanosphaera|s__Methanosphaera_stadtmanae"
#> [4] "k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_arboriphilus"
#> [5] "k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_oralis"
#> [6] "k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii"
otu_table(ps) |>
colnames() |>
head()
#> [1] "SAMEA7041133" "SAMEA7041134" "SAMEA7041135" "SAMEA7041136" "SAMEA7041137"
#> [6] "SAMEA7041138"
sample_data(ps) |>
lapply(\(y) if (is.character(y)) {as.factor(y)} else {y}) |>
as.data.frame() |>
summary()
#> studyName subjectID body_site antibiotics_current_use
#> AsnicarF_2021:1098 s10091 : 1 stool:1098 no:1098
#> s10092 : 1
#> s10181 : 1
#> s10182 : 1
#> s10332 : 1
#> s10691 : 1
#> (Other):1092
#> study_condition disease age age_category gender
#> control:1098 healthy:1098 Min. :18.00 adult :1096 female:792
#> 1st Qu.:36.00 schoolage: 2 male :306
#> Median :46.00
#> Mean :44.81
#> 3rd Qu.:54.00
#> Max. :65.00
#>
#> country non_westernized sequencing_platform DNA_extraction_kit
#> GBR:1001 no:1098 IlluminaNovaSeq:1098 PowerSoilPro:1098
#> USA: 97
#>
#>
#>
#>
#>
#> PMID number_reads number_bases minimum_read_length
#> 33432175:1098 Min. : 5992767 Min. :9.049e+08 Min. :150
#> 1st Qu.:25336668 1st Qu.:3.826e+09 1st Qu.:151
#> Median :27425643 Median :4.141e+09 Median :151
#> Mean :29807717 Mean :4.501e+09 Mean :151
#> 3rd Qu.:31851070 3rd Qu.:4.808e+09 3rd Qu.:151
#> Max. :93097010 Max. :1.406e+10 Max. :151
#>
#> median_read_length curator NCBI_accession
#> Min. :151 Francesco_Asnicar;Paolo_Manghi:1098 ERR4330026: 1
#> 1st Qu.:151 ERR4330027: 1
#> Median :151 ERR4330028: 1
#> Mean :151 ERR4330029: 1
#> 3rd Qu.:151 ERR4330030: 1
#> Max. :151 ERR4330031: 1
#> (Other) :1092
phy_tree(ps)
#>
#> Phylogenetic tree with 639 tips and 638 internal nodes.
#>
#> Tip labels:
#> k__Archaea|p__Euryarchaeota|c__Thermoplasmata|o__Methanomassiliicoccales|f__Methanomassiliicoccaceae|g__Methanomassiliicoccus|s__Candidatus_Methanomassiliicoccus_intestinalis, k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanosphaera|s__Methanosphaera_cuniculi, k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanosphaera|s__Methanosphaera_stadtmanae, k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_arboriphilus, k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_oralis, k__Archaea|p__Euryarchaeota|c__Methanobacteria|o__Methanobacteriales|f__Methanobacteriaceae|g__Methanobrevibacter|s__Methanobrevibacter_smithii, ...
#>
#> Rooted; includes branch lengths.
# Method definition
findMethods(makePhyloseqFromTreeSummarizedExperiment)
#> An object of class "listOfMethods"
#> $SummarizedExperiment
#> Method Definition:
#>
#> function (x, ...)
#> {
#> .local <- function (x, abund_values = "counts")
#> {
#> .require_package("phyloseq")
#> if (!all(dim(x) > 0)) {
#> stop("'x' contains zero rows. 'x' can not be converted\n to a phyloseq object.",
#> call. = FALSE)
#> }
#> .check_assay_present(abund_values, x)
#> args = list()
#> otu_table <- as.matrix(assay(x, abund_values))
#> otu_table <- phyloseq::otu_table(otu_table, taxa_are_rows = TRUE)
#> args[["otu_table"]] <- otu_table
#> if (!(length(rowData(x)[, taxonomyRanks(x)]) == 0 ||
#> is.null((rowData(x)[, taxonomyRanks(x)])))) {
#> tax_table <- as.matrix(rowData(x)[, taxonomyRanks(x)])
#> tax_table <- phyloseq::tax_table(tax_table)
#> args[["tax_table"]] <- tax_table
#> }
#> if (!(length(colData(x)) == 0 || is.null(ncol(colData(x))))) {
#> sample_data <- as.data.frame(colData(x))
#> sample_data <- phyloseq::sample_data(sample_data)
#> args[["sample_data"]] <- sample_data
#> }
#> phyloseq <- do.call(phyloseq::phyloseq, args)
#> return(phyloseq)
#> }
#> .local(x, ...)
#> }
#> <bytecode: 0x55bcdbab5c50>
#> <environment: namespace:mia>
#>
#> Signatures:
#> x
#> target "SummarizedExperiment"
#> defined "SummarizedExperiment"
#>
#> $TreeSummarizedExperiment
#> Method Definition:
#>
#> function (x, ...)
#> {
#> obj <- callNextMethod()
#> args = list()
#> if (!is(obj, "phyloseq")) {
#> args[["otu_table"]] <- obj
#> }
#> if (!(length(rowTree(x)) == 0 || is.null(rowTree(x)))) {
#> phy_tree <- rowTree(x)
#> phy_tree <- phyloseq::phy_tree(phy_tree)
#> if (is(obj, "phyloseq")) {
#> phyloseq::phy_tree(obj) <- phy_tree
#> }
#> else {
#> args[["phy_tree"]] <- phy_tree
#> }
#> }
#> if (!(length(referenceSeq(x)) == 0 || is.null(referenceSeq(x)))) {
#> refseq <- referenceSeq(x)
#> refseq <- phyloseq::refseq(refseq)
#> if (is(obj, "phyloseq")) {
#> obj <- phyloseq::merge_phyloseq(obj, refseq)
#> }
#> else {
#> args[["refseq"]] <- refseq
#> }
#> }
#> if (!is(obj, "phyloseq")) {
#> phyloseq <- do.call(phyloseq::phyloseq, args)
#> }
#> else {
#> phyloseq <- obj
#> }
#> phyloseq
#> }
#> <bytecode: 0x55bcdbac5ef0>
#> <environment: namespace:mia>
#>
#> Signatures:
#> x
#> target "TreeSummarizedExperiment"
#> defined "TreeSummarizedExperiment"
#>
#> Slot arguments:
#> [1] "x"
#> Slot signatures:
#> [[1]]
#> [1] "SummarizedExperiment"
#>
#> [[2]]
#> [1] "TreeSummarizedExperiment"
#>
#> Slot generic:
#> standardGeneric for "makePhyloseqFromTreeSummarizedExperiment" defined from package "mia"
#>
#> function (x, ...)
#> standardGeneric("makePhyloseqFromTreeSummarizedExperiment")
#> <bytecode: 0x55bcdbab1bf8>
#> <environment: 0x55bcdba7c5d8>
#> Methods may be defined for arguments: x
#> Use showMethods(makePhyloseqFromTreeSummarizedExperiment) for currently available ones.
Created on 2021-05-05 by the reprex package (v2.0.0)
@lwaldron, @schifferl, I just added the phyloseq information to the curatedMetagenomicData.Rmd vignette. I tried to reproduce the figures of the phyloseq section of the currently published vignette. I added mia, phyloseq, and ggplot2 (I added this to improve the visualization of the heatmap figure) to the DESCRIPTION file (Suggests).
Is your feature request related to a problem? Please describe. The phyloseq package is the de facto standard for analysis of metagenomics data in R/Bioconductor and conversion to their
phyloseq
class should be provided in curatedMetagenomicData.Describe the solution you'd like If there is another well-maintained package that already has this functionality, we could use their method. I think that's unlikely, but we should at least look first before writing the method from scratch. Otherwise this method would be implemented by: 1) spliting the rownames of the
TreeSummarizedExperiment
intorowData
to be used as thetax_table
of thephyloseq
object, 2) extracting thecolData
as thesample_data
of thephyloseq
object, 3) extracting therelative_abundance
(already converted to counts) matrix as theotu_table
of thephyloseq
object, and 4) inserting the phylogenetic tree from theTreeSummarizedExperiment
in the right place.Describe alternatives you've considered See above, the method would be borrowed from another package if possible and we would simply advise users of availability. Ideally, the method should be in the phyloseq package itself, but I don't think it is.
Additional context The method should be named
asPhyloseq
(no dot because methods dispatch is not really being called). This issue is closely related to issues #214 (my recent comments) and #224. It replaces and closes #190.