joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
582 stars 187 forks source link

plot_ordination() visualisation query? #1618

Closed marwa38 closed 1 year ago

marwa38 commented 2 years ago

Hi Guys Do you know why the dots are in between and not on each boxplot? image Thanks M

ycl6 commented 2 years ago

Hi @marwa38 It's a ggplot2 thing, check this post

marwa38 commented 2 years ago

Hi @ycl6

Thanks for your reply. Will ask my query there then: issue posted on stackflow. This is helpful. Regarding my figure, I tried many changes to the code (in plot_richness()) to delete the title/subtitle/caption (I don't how to define it) of the previous word "Shannon" in my attached figure in the post, do you know what I can do to get rid of it?

marwa38 commented 2 years ago

Hi @ycl6 Here are the codes if you would to have a quick look;

plot_richness(ps.prev.intesParts.f, x = "part", measures = "Shannon", 
              color = "Samples") +
  geom_boxplot() +
  theme_classic() +
  theme(text = element_text(size = 20)) +
  theme(strip.background = element_blank(), axis.text.x.bottom = element_text(angle = 90)) +
  labs(x = "Intestinal Parts", y = "Shannon Index") +
  theme(legend.title = element_blank())

I think that the figure issue is not the same as the one in stackflow you kindly shared, and might be originating from the original plot_richness() code outputs, what do you think? in addition, I haven't seen this issue before using ggplot.

ycl6 commented 2 years ago

Hi @marwa38 Like you suspected, the plot_richness() function is pretty restricted to what you can do with the output, in this case, to dodge the points. Therefore, you can try to create the plot using ggplot2, e.g.

library(phyloseq)
library(ggplot2)

data(GlobalPatterns)

# Randomly add another sample feature to use to color the samples
set.seed(1)
sample_data(GlobalPatterns)$Rand = sample(rep(LETTERS[1:2], nsamples(GlobalPatterns)/2))

# Calculate Shannon
ad = estimate_richness(GlobalPatterns, measures = "Shannon")
ad = merge(data.frame(sample_data(GlobalPatterns)), ad, by = "row.names")

ggplot(ad, aes(SampleType, Shannon, color = Rand)) + theme_classic() +
        geom_point(position = position_dodge(0.75)) + geom_boxplot(outlier.shape = NA) +
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

plot_richness

> head(ad)
  Row.names X.SampleID  Primer Final_Barcode Barcode_truncated_plus_T Barcode_full_length
1    AQC1cm     AQC1cm ILBC_16        ACAGCA                   TGCTGT         GACCACTGCTG
2    AQC4cm     AQC4cm ILBC_17        ACAGCT                   AGCTGT         CAAGCTAGCTG
3    AQC7cm     AQC7cm ILBC_18        ACAGTG                   CACTGT         ATGAAGCACTG
4       CC1        CC1 ILBC_02        AACTCG                   CGAGTT         CATCGACGAGT
5       CL3        CL3 ILBC_01        AACGCA                   TGCGTT         CTAGCGTGCGT
6     Even1      Even1 ILBC_27        ACCGCA                   TGCGGT         TGACTCTGCGG
          SampleType                              Description Rand  Shannon
1 Freshwater (creek)             Allequash Creek, 0-1cm depth    A 3.552736
2 Freshwater (creek)            Allequash Creek, 3-4 cm depth    B 3.372495
3 Freshwater (creek)            Allequash Creek, 6-7 cm depth    A 4.027716
4               Soil Cedar Creek Minnesota, grassland, pH 6.1    B 6.776603
5               Soil Calhoun South Carolina Pine soil, pH 4.9    A 6.576517
6               Mock                                    Even1    B 4.083665