joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
584 stars 187 forks source link

Make phyloseq object from "BCI", "BCI.env" to use ordinate, plot_ordination? #543

Closed CarlyMuletzWolz closed 8 years ago

CarlyMuletzWolz commented 9 years ago

Hello,

Can you bring in a site by species matrix as an 'otu table' and environmental data as sample meta-data and merge them into a phyloseq object?

For instance, import the BCI data and BCI.env and merge them to use some of the great plotting tools developed in phyloseq.

I am putting together a script tutorial on analyzing and visualizing beta-diversity, and love the use of ggplot2 's plotting style in Phyloseq, but I want it to be useable for a wide variety of community ecologists.

spholmes commented 9 years ago

Dear Carly, Indeed this is something we do all the time, the only really important things to rememebr are 1) The otu matrix is a matrix with rownames that are OTUs ( and column names that are samples as when taxa_are_rows=TRUE) 2)these sample names must match the sample names (row names of the sample_data) in the covariate information and the number of samples has to match of course. 3) You don't have to have a tree or other information.

Best of luck Susan

On Thu, Oct 22, 2015 at 5:44 PM, Carly Muletz Wolz <notifications@github.com

wrote:

Hello,

Can you bring in a site by species matrix as an 'otu table' and environmental data as sample meta-data and merge them into a phyloseq object?

For instance, import the BCI data and BCI.env and merge them to use some of the great plotting tools developed in phyloseq.

I am putting together a script tutorial on analyzing and visualizing beta-diversity, and love the use of ggplot2 's plotting style in Phyloseq, but I want it to be useable for a wide variety of community ecologists.

— Reply to this email directly or view it on GitHub https://github.com/joey711/phyloseq/issues/543.

Susan Holmes Professor, Statistics and BioX John Henry Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

CarlyMuletzWolz commented 8 years ago

Great thank you for the information. Here is the code if anyone else ever wants to do it.

this is site by species matrix, need row.names = 1 to have phyloseq read it

site_species <- read.csv("GP_site_species.csv", header = T, row.names = 1)

transpose to get a species x site matrix

species_site_transposed <- t(site_species)

NEVER use numbers as a name for Column or Row label in R, R puts an X in front of it
get rid of Xs that were inserted in front of numbers row.names

rownames(species_site_transposed) <- gsub("X","",row.names(species_site_transposed))

need this to be a matrix

class(species_site_transposed)

make compatible for phyloseq format

species_site_final = otu_table(species_site_transposed, taxa_are_rows = TRUE)

Read taxonomy info in

taxonomy <- read.csv("GP_taxonomy.csv", row.names = 1)

Needs to be a matrix

class(taxonomy) taxonomy <- as.matrix(taxonomy)

Make compatible for phyloseq

taxonomy_final = tax_table(taxonomy)

meta_data <- read.csv("GP_meta.csv", header = T, row.names = 1)

dataframe is expected for sample_data

class(meta_data)

make compatible for phyloseq

meta_final <- sample_data(meta_data)

good to go
You can also add a phylogenetic tree here, if you have one
Merge it all together

Our_GP_data <- merge_phyloseq(species_site_final, taxonomy_final, meta_final)

Our_GP_data

######### I used the global patterns dataset and decomposed it to its separate parts to show how to bring in different data not in 'microbiome' format

I decomposed the Global Patterns dataset from Phyloseq package
Here for bookkeeping on how I did this

library("phyloseq")

get dataset

data(GlobalPatterns)

info about data

?(GlobalPatterns)

too big, reduce to work with

GP2 = prune_taxa(taxa_sums(GlobalPatterns) > 1000, GlobalPatterns)

Get top 10 genera

top10genus = sort(tapply(taxa_sums(GP2), tax_table(GP2)[, "Genus"], sum), TRUE)[1:10] GP3 = subset_taxa(GP2, Genus %in% names(top10genus)) GP3

write files to use
Phyloseq uses matrix as species x site, most people make them as site x species

species_site <-as(otu_table(GP3), "matrix") site_species <- t(species_site)

taxon table

tax <- as(tax_table(GP3), "matrix") sample_data <- as(sample_data(GP3), "data.frame")

getwd() setwd("/Users/Carly/Desktop/Biodiversity_class/beta-diversity/") write.csv(site_species, "GP_site_species.csv") write.csv(tax, "GP_taxonomy.csv") write.csv(sample_data, "GP_meta.csv")