joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
584 stars 187 forks source link

Make phyloseq object from "BCI", "BCI.env" to use ordinate, plot_ordination? #543

Closed CarlyMuletzWolz closed 8 years ago

CarlyMuletzWolz commented 9 years ago


Can you bring in a site by species matrix as an 'otu table' and environmental data as sample meta-data and merge them into a phyloseq object?

For instance, import the BCI data and BCI.env and merge them to use some of the great plotting tools developed in phyloseq.

I am putting together a script tutorial on analyzing and visualizing beta-diversity, and love the use of ggplot2 's plotting style in Phyloseq, but I want it to be useable for a wide variety of community ecologists.

spholmes commented 9 years ago

Dear Carly, Indeed this is something we do all the time, the only really important things to rememebr are 1) The otu matrix is a matrix with rownames that are OTUs ( and column names that are samples as when taxa_are_rows=TRUE) 2)these sample names must match the sample names (row names of the sample_data) in the covariate information and the number of samples has to match of course. 3) You don't have to have a tree or other information.

Best of luck Susan

On Thu, Oct 22, 2015 at 5:44 PM, Carly Muletz Wolz <



Can you bring in a site by species matrix as an 'otu table' and environmental data as sample meta-data and merge them into a phyloseq object?

For instance, import the BCI data and BCI.env and merge them to use some of the great plotting tools developed in phyloseq.

I am putting together a script tutorial on analyzing and visualizing beta-diversity, and love the use of ggplot2 's plotting style in Phyloseq, but I want it to be useable for a wide variety of community ecologists.

— Reply to this email directly or view it on GitHub

Susan Holmes Professor, Statistics and BioX John Henry Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305

CarlyMuletzWolz commented 8 years ago

Great thank you for the information. Here is the code if anyone else ever wants to do it.

this is site by species matrix, need row.names = 1 to have phyloseq read it

site_species <- read.csv("GP_site_species.csv", header = T, row.names = 1)

transpose to get a species x site matrix

species_site_transposed <- t(site_species)

NEVER use numbers as a name for Column or Row label in R, R puts an X in front of it
get rid of Xs that were inserted in front of numbers row.names

rownames(species_site_transposed) <- gsub("X","",row.names(species_site_transposed))

need this to be a matrix


make compatible for phyloseq format

species_site_final = otu_table(species_site_transposed, taxa_are_rows = TRUE)

Read taxonomy info in

taxonomy <- read.csv("GP_taxonomy.csv", row.names = 1)

Needs to be a matrix

class(taxonomy) taxonomy <- as.matrix(taxonomy)

Make compatible for phyloseq

taxonomy_final = tax_table(taxonomy)

meta_data <- read.csv("GP_meta.csv", header = T, row.names = 1)

dataframe is expected for sample_data


make compatible for phyloseq

meta_final <- sample_data(meta_data)

good to go
You can also add a phylogenetic tree here, if you have one
Merge it all together

Our_GP_data <- merge_phyloseq(species_site_final, taxonomy_final, meta_final)


######### I used the global patterns dataset and decomposed it to its separate parts to show how to bring in different data not in 'microbiome' format

I decomposed the Global Patterns dataset from Phyloseq package
Here for bookkeeping on how I did this


get dataset


info about data


too big, reduce to work with

GP2 = prune_taxa(taxa_sums(GlobalPatterns) > 1000, GlobalPatterns)

Get top 10 genera

top10genus = sort(tapply(taxa_sums(GP2), tax_table(GP2)[, "Genus"], sum), TRUE)[1:10] GP3 = subset_taxa(GP2, Genus %in% names(top10genus)) GP3

write files to use
Phyloseq uses matrix as species x site, most people make them as site x species

species_site <-as(otu_table(GP3), "matrix") site_species <- t(species_site)

taxon table

tax <- as(tax_table(GP3), "matrix") sample_data <- as(sample_data(GP3), "data.frame")

getwd() setwd("/Users/Carly/Desktop/Biodiversity_class/beta-diversity/") write.csv(site_species, "GP_site_species.csv") write.csv(tax, "GP_taxonomy.csv") write.csv(sample_data, "GP_meta.csv")