joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

tax_glom question for finding genus foun in all samples of a given group #957

Open AlexandreThibodeauUdM opened 6 years ago

AlexandreThibodeauUdM commented 6 years ago

Yet another question for a really great great tool!

tax_glom

after tax_glom on a given phyloseq object, you obtain another phyloseq object.

in this object, you can have a the OTU table, sample data and tax table.

my understanding is that the otu table match the tax table

but my question is: what is the OTU name being given in the OTU table because this name is now composed of all names that shared the same taxonomy that got put together after tax_glom, so how is this name chosen?

I am trying to get the genus found in all samples from a sample group so I am doing this and I wonder if I am doing ok. Thanks again!


phyto_j7_genresglom <- tax_glom(phyto_j7_filter, taxrank = "Genus") phyto_j7_genresglom phyto_j7_genresglomT_neg <-subset_samples(phyto_j7_genresglom, Traitement == "T_neg") phyto_j7_genresglomT_neg phyto_j7_genresglomT_neg_filter <-prune_taxa(taxa_sums(phyto_j7_genresglomT_neg ) >= 1, phyto_j7_genresglomT_neg ) phyto_j7_genresglomT_neg_filter j7_table_genreT_neg2 <- as.data.frame(otu_table(phyto_j7_genresglomT_neg_filter)) j7_table_genreT_neg2 dim(j7_table_genreT_neg2) j7_table_genreT_neg2[j7_table_genreT_neg2>=1]<-1 dim(j7_table_genreT_neg2) genre_OTU_j7T_neg_transfo_echant_pos <-j7_table_genreT_neg2[, colSums(j7_table_genreT_neg2 != 0) > 15] #I have 16 samples genre_OTU_j7T_neg_transfo_echant_pos dim(genre_OTU_j7T_neg_transfo_echant_pos) j7_T_neg_listeOTU_100pour <- colnames(genre_OTU_j7T_neg_transfo_echant_pos) j7_T_neg_listeOTU_100pour length(j7_T_neg_listeOTU_100pour) phyloseq_genre_100pour_Tneg <-prune_taxa(j7_T_neg_listeOTU_100pour,phyto_j7_genresglomT_pos_filter) phyloseq_genre_100pour_Tneg table_genres_100pourj7T_neg <-as.data.frame(tax_table(phyloseq_genre_100pour_Tneg)) liste_genres_100pourj7T_neg <-table_genres_100pourj7T_neg$Genus liste_genres_100pourj7T_neg

MSMortensen commented 6 years ago

Hi,

First of all the taxa_names used after agglomeration is the taxa_name of the first OTU for each taxa (or the most abundant, depending on your chosen setting).

Your commands will give you the aswer that you are looking for, but it can be done a lot shorter;

Agglomerate

phyto_j7_genresglom <- tax_glom(phyto_j7_filter, taxrank = "Genus")

Use genus as taxa names

taxa_names(phyto_j7_genresglom) <- tax_table(phyto_j7_genresglom)[,6]

Filter Traitement == "T_neg")

phyto_j7_genresglomT_neg <-subset_samples(phyto_j7_genresglom, Traitement == "T_neg")

Export OTU table

j7_table_genreT_neg2 <- as.data.frame(otu_table(phyto_j7_genresglomT_neg_filter))

Make binary

j7_table_genreT_neg2[j7_table_genreT_neg2>=1]<-1

remove any not present in all

genre_OTU_j7T_neg_transfo_echant_pos <-j7_table_genreT_neg2[, colSums(j7_table_genreT_neg2 != 0) > 15] #I have 16 samples

Extract relevant taxa

j7_T_neg_listeOTU_100pour <- colnames(genre_OTU_j7T_neg_transfo_echant_pos)

snowcastle1 commented 6 years ago

I'm also trying to find all Genera shared by all samples. I tried the following code but got stuck on using the genus as taxa names. `> top20genus_GI <- tax_glom(mapped_BIOM, taxrank = "Rank6")

taxa_names(top20genus_GI) <- tax_table(top20genus_GI)[,6] Error in taxa_names<-(*tmp*, value = new("taxonomyTable", .Data = c("Ambiguous_taxa", : taxa_names<-: You are attempting to assign duplicated taxa_names`

Is there another way of doing this? I'm pretty new at this.

MSMortensen commented 6 years ago

Hi, Tax_glom only merges taxa which have an identical classification in your chosen taxrank and all higher taxranks. You run into trouble if you have two OTUs with different higher classification and missing genus classification. I normally clean bad info from my tax table (unidentified ..., metagenome, ... ) and then I run the following script to avoid empty fields:

Export tax table

tax <- data.frame(tax_table(psdata), stringsAsFactors=FALSE)

This step is depending on the origin of your taxa, this suits qiime2 classified against SILVA

tax.clean <- data.frame(row.names = row.names(tax), Kingdom = str_replace(tax[,1], "D_0",""), Phylum = str_replace(tax[,2], "D_1",""), Class = str_replace(tax[,3], "D_2",""), Order = str_replace(tax[,4], "D_3",""), Family = str_replace(tax[,5], "D_4",""), Genus = str_replace(tax[,6], "D_5",""), Species = str_replace(tax[,7], "D_6__",""), stringsAsFactors = FALSE) tax.clean[is.na(tax.clean)] <- ""

Ensure that all columns are characters

for (i in 1:7){ tax.clean[,i] <- as.character(tax.clean[,i])}

Fille holes in the tax table

for (i in 1:nrow(tax.clean)){

Fill in missing taxonomy

if (tax.clean[i,2] == ""){ kingdom <- paste("Kingdom", tax.clean[i,1], sep = "") tax.clean[i, 2:7] <- kingdom } else if (tax.clean[i,3] == ""){ phylum <- paste("Phylum", tax.clean[i,2], sep = "") tax.clean[i, 3:7] <- phylum } else if (tax.clean[i,4] == ""){ class <- paste("Class", tax.clean[i,3], sep = "") tax.clean[i, 4:7] <- class } else if (tax.clean[i,5] == ""){ order <- paste("Order", tax.clean[i,4], sep = "") tax.clean[i, 5:7] <- order } else if (tax.clean[i,6] == ""){ family <- paste("Family", tax.clean[i,5], sep = "") tax.clean[i, 6:7] <- family } else if (tax.clean[i,7] == ""){ tax.clean$Species[i] <- paste("Genus",tax.clean$Genus[i], sep = "") } }

AlexandreThibodeauUdM commented 6 years ago

Thank you so much for your answer! Sorry, it took me so long to respond!

grabear commented 6 years ago

BTW if you use the Silva database with qiime then you can utilize this parsing function:

854 @MSMortensen