Closed brookeweigel closed 6 years ago
Nevermind... I figured it out! I had some unassigned (blank) taxonomy cells... once I renamed those to "unknown", the gaps went away! Maybe this will be useful to somebody as well...
# Transform to relative abundance
AID_norm <- transform_sample_counts(AID_R, function(x) 100 * x/sum(x))
# Compile taxa by Order (filtering out low abundance taxa)
AID_Orders <- AID_norm %>%
tax_glom(taxrank = "Order") %>% # agglomerate taxa at order level
psmelt() %>% # Melt phyloseq object to long format for producing graphics with ggplot2
filter(Abundance > 1.0) %>% # Filter out orders below 1% in each sample
arrange(desc(Phylum),desc(Class),desc(Order))
# Sum remaining taxa with a relative abundance < 1% and make a new dataframe
Remainders <- (AID_Orders) %>%
group_by(SampleID,Mouse,Sex,Diet,Week,Description) %>%
summarise(Abundance = (100-sum(Abundance))) %>%
as.data.frame()
Remainders$Order<-"Orders < 1%"
Remainders$Phylum<-"_Orders < 1%"
Remainders$Class<-"_Orders < 1%"
Remainders$Kingdom<-"Bacteria"
# Join dataframes
AID_barchart <- full_join(AID_Orders,Remainders)
# reorder based on phylogeny
AID_barchart <- AID_barchart[with(AID_barchart, order(Phylum,Class,Order)),]
# lock in Order level
AID_barchart$Order <- factor(AID_barchart$Order, levels = AID_barchart$Order)
I figured it out. Just in case anybody needs to know. You just need to transform the phyloseq object using the psmelt() function and then you can generate the graph using ggplot() and command position = "fill" argument inside geom_bar() function. For example: ggplot(data = psmelt(phyloseq_object), mapping = aes_string(x = "variable of your data frame",y = "Abundance")) + geom_bar(aes(color=taxon name, fill=taxon name), stat="identity", position="fill")
Even I am facing the same problem. I tried by removing unassigned taxa manually but still, the output is same. @brookeweigel how did you remove the blank taxonomic cells?
Thanks.
@brookeweigel I would also like to know which code you used to remove the unassigned taxa, I am having the same issue. Thanks for sharing!
Hello @rahulnccs and @Nat211! It has been a little while since I looked at this, but here is my recent code that worked! It starts with a phyloseq object and ends with a nice looking stacked bar graph, using ggplots in R with the phyloseq package.
library(ggplot2) library(phyloseq) library(tidyverse)
physeq3 = transform_sample_counts(physeq2, function(x) x / sum(x) ) physeq3
glom <- tax_glom(physeq3, taxrank = 'class') glom # should list # taxa as # phyla data_glom<- psmelt(glom) # create dataframe from phyloseq object data_glom$class <- as.character(data_glom$class) #convert to character
data_glom$class[data_glom$Abundance < 0.01] <- "< 1% abund."
Count = length(unique(data_glom$class)) Count
unique(data_glom$class)
data_glom$class <- factor(data_glom$class, levels = c("Alphaproteobacteria", "Betaproteobacteria", "Deltaproteobacteria", "Gammaproteobacteria", "Flavobacteriia", "Verrucomicrobiae", "Planctomycetia", "Saprospirae", "Sphingobacteriia", "BD1-5", "Cytophagia", "Acidobacteriia", "OM190", "TA18", "Unknown", "< 1% abund."))
spatial_plot <- ggplot(data=data_glom, aes(x=Sample, y=Abundance, fill=class)) + facet_grid(~Location, scales = "free") spatial_plot + geom_bar(aes(), stat="identity", position="stack") + scale_fill_manual(values = c("royalblue4", "deepskyblue", "blue", "cyan2", "darkorchid", "gold1", "forestgreen", "firebrick", "mediumspringgreen", "darkorange1", "saddlebrown", "deeppink", "slategray2", "seagreen", "snow4", "black")) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=5))
Hi @brookeweigel,
This code is great thank you! It was exactly what I'm looking for.
However, I'm still having the issue of the bar graph not reaching 100%.
I've checked the Abundance values, there were a few that were 0 in the dataframe so I removed these rows, but I still seem to be having the same issues.
This is my code:
phy.squeeze = subset_samples(phy.3, Section == "Squeeze") phy.squeeze <- prune_taxa(taxa_sums(phy.squeeze) > 0, phy.squeeze)
phy.squeeze = transform_sample_counts(phy.squeeze, function(x) x / sum(x))
glom <- tax_glom(phy.squeeze, taxrank = "Phylum") glom # should list # taxa as # phyla data_glom <- psmelt(glom) # create dataframe from phyloseq object data_glom$Phylum <- as.character(data_glom$Phylum) #convert to character
data_glom$Phylum[data_glom$Abundance < 0.01] <- "< 1% abund."
data_glom[data_glom==0] <- NA
data_glom <- data_glom[complete.cases(data_glom),]
Count = length(unique(data_glom$Phylum)) Count
unique(data_glom$Phylum)
data_glom$Phylum <- factor(data_glom$Phylum, levels = c("Proteobacteria", "Firmicutes", "Tenericutes", "Bacteroidetes", "Actinobacteria", "Fusobacteria", "Patescibacteria", "Planctomycetes", "Verrucomicrobia", "Epsilonbacteraeota", "< 1% abund."))
spatial_plot <- ggplot(data=data_glom, aes(x=Scores, y=Abundance, fill=Phylum)) + facet_grid(~Season, scales = "free") spatial_plot + geom_bar(aes(), stat="identity", position="stack") + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=5))
Hi @Slamazon4, it looks like your abundance values are not between 0-1, (they range from 0-20+), so the transform_sample_counts must not have worked properly. I can't find the exact reason for your error, but sometimes with phyloseq I have to clear my workspace environment in R and start over, because maybe you re-ran a previous code that wrote over the transformed sample count data. I'm sorry that I can't offer any better advice, but I think that the transform sample count step is the source of your problem (why the bar graph doesn't reach 100%). I hope you get it to work!
Thanks for getting back to me so quick @brookeweigel !
I did start fresh and run the code again, but the same thing happened. I checked the abundance values and they are all between 0 and 1 so I don't think that's it unfortunately :(
Do you think there's a chance it might be because the transformation to relative abundance is done prior to creating the dataframe, and for some reason when I agglomerate at Phylum level there are a few ASVs/features that come up with 0 values (which I'm removing) but they were included in the calculation for a 100%?
And perhaps also using facet_grid on the data for the "Season" variable splits the data again so it further distributes the 100% calculation?
I'm not sure if I'm making any sense, haha, but if you have any insight let me know.
Hi @brookeweigel! I followed you script step by step but... I'm still having the issue of the bar graph not reaching 100%. I didn't found where I have to change the unassigned (blank) taxonomy cells. It's a nice barplot but with some problem!
The script that I used was: `> #Turn all OTUs into class (or phylum or order level) counts
glom <- tax_glom(tot2.ra, taxrank = 'Class') glom # should list # taxa as # phyla phyloseq-class experiment-level object otu_table() OTU Table: [ 132 taxa and 14 samples ] sample_data() Sample Data: [ 14 samples by 12 sample variables ] tax_table() Taxonomy Table: [ 132 taxa by 8 taxonomic ranks ] data_glom<- psmelt(glom) # create dataframe from phyloseq object data_glom$Class <- as.character(data_glom$Class) #convert to character
simple way to rename phyla with < 1% abundance
data_glom$Class[data_glom$Abundance < 1] <- "< 1% abund."
Count # phyla to set color palette
Count = length(unique(data_glom$Class)) Count [1] 41
To change order of bacterial classes, use factor() to change order of levels (for example,
maybe you want to put the most abundant taxa first)... Use unique(data_glom$class) to see exact spelling of each class
unique(data_glom$Class) [1] "cClostridia" "cGammaproteobacteria" "cAlphaproteobacteria" "cDesulfotomaculia" "cSynergistia"
[6] "cNegativicutes" "cBacilli" "cAnaerolineae" "cParcubacteria" "cPlanctomycetes"
[11] "cVerrucomicrobiae" "cThermoanaerobacteria" "cSyntrophomonadia" "cVicinamibacteria" "cThermotogae"
[16] "cSyntrophia" "cSpirochaetia" "cLeptospirae" "cKiritimatiellae" "cSyntrophorhabdia"
[21] "cFermentibacteria" "cActinobacteria" "cAminicenantia" "cPhycisphaerae" "cBacteroidia"
[26] "cSubgroup_18" "cChlamydiae" "cBRH-c20a" "cSaccharimonadia" "cLentisphaeria"
[31] "c__Incertae_Sedis" "cDadabacteriia" "cABY1" "cDehalococcoidia" "cChloroflexia"
[36] "cMoorellia" "cuncultured" "cHydrogenedentia" "cPolyangia" "c__MVP-15"
[41] "< 1% abund."plot with condensed phyla into "unknown" category
spatial_plot <- ggplot(data=data_glom, aes(x=Exp, y=Abundance, fill=Class)) + #facet_grid(~otros, scales = "free")
spatial_plot +
- geom_bar(aes(), stat="identity", position="stack") +
scale_fill_manual(values = c("royalblue4", "deepskyblue", "blue", "cyan2", "darkorchid",
"gold1", "forestgreen", "firebrick", "mediumspringgreen", "darkorange1",
"saddlebrown", "deeppink", "slategray2", "seagreen", "snow4", "black")) +
- scale_fill_manual(values = c("#70dc85", "#78ac83", "#006602", "#7e5b00", "#5d7fff", "#cb88a6", "#a152da", "#0079b9", "#81d33d",.......................")) +
- theme(legend.position="bottom") + guides(fill=guide_legend(nrow=5)) spatial_plot2 <- spatial_plot + scale_x_discrete(limits = newSTorder) spatial_plot2`
@brookeweigel @PatoUru
I figured out what my issue was. I needed to plot 'samples' on the x axis for the script to work. I was trying to plot one of my categorical variables instead.
So instead of this:
spatial_plot <- ggplot(data=data_glom, aes(x=Scores, y=Abundance, fill=Phylum)) + facet_grid(~Season, scales = "free") spatial_plot + geom_bar(aes(), stat="identity", position="stack") + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=5))
I did this:
spatial_plot <- ggplot(data=data_glom, aes(x=samples, y=Abundance, fill=Phylum)) + face_grid(~Season, scales = "free") spatial_plot + geom_bar(aes(), stat="identity", position="stack") + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=5))
If I wanted to plot my variables along the x axis I had to transform my data x2. For example, if I wanted to plot by season:
Transform sample counts phy.squeeze = transform_sample_counts(phy.squeeze, function(x) x / sum(x))
Turn all OTUs into class (or phylum or order level) counts glom.phylum <- tax_glom(phy.squeeze, taxrank = "Phylum")
And then merge samples for whatever variable I want to plot along the x axis glom.season <- merge_samples(glom.phylum, "Season"
Transform counts again glom.season.phylum = transform_sample_counts(glom.season function(x) x / sum(x)) data_glom <- psmelt(glom.season.phylum) # create dataframe from phyloseq object
Now I can plot my desired variable along the x axis season <- ggplot(data=data_glom, aes(x=Season, y=Abundance, fill=Phylum)) + geom_bar(aes(),stat="identity", position="stack") + scale_fill_manual(values = c("royalblue3","darkgoldenrod1", "cyan2", "darkorchid4", "black")) + theme(legend.position="bottom")
@PatoUru
I ran the tax_glom line with my raw counts phyloseq object and then converted that output to relative abundance rather than using relative abundance to select for one division of taxonomy. This resolved my issue with having the plot come up short of 1.0, hope this helps.
Hi, i need help. When i plot my otu, my graph plot showed the same group sample apart. How can I arrange them, so that they come next to each other? e.g I have LF1, LF2, LF3 AND HF1,HF2,HF3.. the bar graph comes out like HF1, LF2,LF3,HF2,HF3.
Thanks for the help! I had a similar problem
Hello! I tried to follow the logic from the following issue (https://github.com/joey711/phyloseq/issues/494)... using tips from @audy and @Nick243. But for some reason, when I try creating a new data frame with the phyla condensed into low abundance (< 1%) taxa, I end up with some samples that do not reach 100% on the bar graph... and I cannot figure out how some of the values went "missing" so that relative abundance does not total 100%. See the two plots compared below, 1) for the condensed phyla bar plot, and 2) for the normal relative abundance bar plot (which reaches 100% on the y axis for all samples). Any help would be very useful! Thanks!!
merge into one phyloseq object
physeq = phyloseq(OTU, TAX, META, phy_tree) physeq
Transform into relative abundances + filter to a mean threshold
physeq2 = filter_taxa(physeq, function(x) mean(x) > 0.1, TRUE) physeq2 physeq3 = transform_sample_counts(physeq2, function(x) x / sum(x) ) physeq3
choose samples (only stomach)
stomach <- subset_samples(physeq3, Sample_type=="Stomach") stomach
create dataframe from phyloseq object for stomach samples... to graph samples without condensed phyla
stom <- psmelt(stomach)
Condense low abundance taxa into an "Other" category for the barplot
Turn all OTUs into phylum counts
glom <- tax_glom(stomach, taxrank = 'phylum') glom # should list # taxa as # phyla data <- psmelt(glom) # create dataframe from phyloseq object data$phylum <- as.character(data$phylum) #convert to character
simple way to rename phyla with < 1% abundance
data$phylum[data$Abundance < 0.01] <- "< 1% abund."
I also tried using this method, based on median abundance threshold... either way, I still end up with sample that do not reach 100% on the bar graph
group dataframe by Phylum, calculate median rel. abundance
medians <- ddply(data, ~phylum, function(x) c(median=median(x$Abundance)))
find Phyla whose rel. abund. is less than 1%
remainder <- medians[medians$median <= 0.01,]$phylum
list of low abundance phyla
remainder
change their name to "Phyla < 1% abund."
data[data$phylum %in% remainder,]$Phylum <- "Phyla < 1% abund."
rename phyla with < 1% relative abundance
data$phylum[data$Abundance < 0.01] <- "Phyla < 1% abund."
plot with condensed phyla into "< 1% abund" category
p <- ggplot(data=data, aes(x=Sample, y=Abundance, fill=phylum)) p + geom_bar(aes(), stat="identity", position="stack") + scale_fill_manual(values = c("darkblue", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "lightskyblue", "darkgreen", "deeppink", "khaki2", "firebrick", "brown1", "darkorange1", "cyan1", "royalblue4", "darksalmon", "darkblue", "royalblue4", "dodgerblue3", "steelblue1", "lightskyblue", "darkseagreen", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "brown1", "darkorange1", "cyan1", "darkgrey")) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=5))
compare to plot of same data, without condensed phyla
p <- ggplot(data=stom, aes(x=Sample, y=Abundance, fill=phylum)) p + geom_bar(aes(), stat="identity", position="stack") + scale_fill_manual(values = c("darkblue", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "lightskyblue", "darkgreen", "deeppink", "khaki2", "firebrick", "brown1", "darkorange1", "cyan1", "royalblue4", "darksalmon", "darkblue", "royalblue4", "dodgerblue3", "steelblue1", "lightskyblue", "darkseagreen", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "brown1", "darkorange1", "cyan1", "darkgrey", "darkblue", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "lightskyblue", "darkgreen", "deeppink", "khaki2", "firebrick", "brown1", "darkorange1", "cyan1", "royalblue4", "darksalmon", "darkblue", "royalblue4", "dodgerblue3", "steelblue1", "lightskyblue", "darkseagreen", "darkgoldenrod1", "darkseagreen", "darkorchid", "darkolivegreen1", "brown1", "darkorange1", "cyan1", "darkgrey")) + theme(legend.position="bottom") + guides(fill=guide_legend(nrow=5))