joey711 / phyloseq

phyloseq is a set of classes, wrappers, and tools (in R) to make it easier to import, store, and analyze phylogenetic sequencing data; and to reproducibly share that data and analysis with others. See the phyloseq front page:
http://joey711.github.io/phyloseq/
579 stars 188 forks source link

plot_ordination() double open symbol #250

Closed jstearns closed 10 years ago

jstearns commented 10 years ago

I get double circles whenever I use plot_ordination() and set shape values manually. This is only apparent when I use open symbols and increase the size of the points with geom_point(size)

wUF1 <- plot_ordination(PS2400, wUF_ordu, type = "samples", color = "NP_OP", shape = "A_C")
wUF1 + 
scale_shape_manual(values = c(19, 1)) + 
scale_color_manual(values = c("#1805F0", "#F005A5")) + 
geom_point(size = 4)

rplot

joey711 commented 10 years ago

Hey Jen (@jstearns )!

Thanks for the feedback. This is an interesting issue with the way ggplot2 does layering. I think I've posted an example solution to this somewhere, but I don't remember where, now, so I'll write one out fully here. I'll show some code, but first, the explanation:

In some cases it might appear that adding a geom_point layer replaces the previous point layer, but this only happens if points are opaque and larger, therefore covering-up the original layer. In fact, ggplot2 is literally just adding a second layer, and rendering both when the ggplot-object gets a request to be plotted. In your example, you've added a layer with larger but-still-hollow points. Because they're hollow, the previous points can still be seen.

Here is code to fully reproduce another example of your exact issue with the GlobalPatterns dataset:

library("phyloseq")
library("ggplot2")
theme_set(theme_bw())
data("GlobalPatterns")
human = get_variable(GlobalPatterns, "SampleType") %in% c("Feces", "Mock", "Skin", 
    "Tongue")
sample_data(GlobalPatterns)$human <- factor(human)

That was all just loading the phyloseq package and setting up the GlobalPatterns dataset for the example. Here is the part to plot the ordination. unnamed-chunk-11

GPbraymds = ordinate(GlobalPatterns, "MDS", "bray")
p = plot_ordination(GlobalPatterns, GPbraymds, color = "SampleType", shape = "human")
p

unnamed-chunk-11

Look at $layers for the object.

p$layers
## [[1]]
## geom_point: na.rm = TRUE 
## stat_identity:  
## position_identity: (width = NULL, height = NULL)

Now add the extra layer and scale elements that you wanted.

p = p + geom_point(size = 5) + scale_shape_manual(values = c(19, 1))
p

unnamed-chunk-12

p$layers
## [[1]]
## geom_point: na.rm = TRUE 
## stat_identity:  
## position_identity: (width = NULL, height = NULL)
## 
## [[2]]
## geom_point: na.rm = FALSE, size = 5 
## stat_identity:  
## position_identity: (width = NULL, height = NULL)

As you can see, it reproduced your same problem, and there are two geom_point layers, but you really only want the second one.

Here is how you can remove the original geom_point layer that was added by plot_ordination, just like you would pop something out of a list.

p$layers <- p$layers[-1]
p

list-pop

And as you can see, that last line of code that tweaked p performed both list subsetting and replacement to remove the first point layer, and this solved your problem.

Also, note that what I just showed you is not a standard ggplot2 procedure that I have seen in any documentation, and the official ggplot2 recommendation would probably be to just not make that original layer in the first place. However, it is a perfectly valid thing to do in R, even if it is a bit of a shortcut and you should be cautious about what layers you remove when you do it.

It is also possible to grab the data, in this case stored in p$data, and then rebuild a new ggplot2 object "from scratch" using standard ggplot2 commands. The data in p$data is already very well organized for this purpose, so it is not a bad option if you had a different example with multiple layers you wanted to remove/replace.

I will close this issue for now, but let me know if you have any follow-up issues related to this.

jstearns commented 10 years ago

Thanks a bunch, that works. Also thanks for the prompt response, much appreciated.

insectnate commented 10 years ago

I have a question related to this post. I have used the plot_ordination function to make a split plot similar to that in part 4 of the plot_ord tutorial p4 = plot_ordination(GP1, GP.ord, type = "split", color = "Phylum", shape = "human", label = "SampleType", title = "split") I altered the colors to make them more appealing but I would like to make the sample points and text on the left side of the split plot a bit larger to make the figure more legible. I have tried a number of scale_size_manual(values = c(6, 6, 2)) but this has no effect on the graphical output. I can alter the shape using commands as in the above example but cannot seem to alter the size of the sample half of the ggplot object. I am sure this is a simple procedure but I am stuck... Any help greatly appreciated.

p5_1$layers [[1]] geom_point: na.rm = TRUE stat_identity:
position_identity: (width = NULL, height = NULL)

[[2]] mapping: x = NMDS1, y = NMDS2, label = Origin, na.rm = TRUE geom_text: parse = FALSE, na.rm = TRUE, size = 2, vjust = 1.5 stat_identity:
position_identity: (width = NULL, height = NULL)

joey711 commented 10 years ago

@insectnate Can you post the code with your figure alterations, and what you're trying to do? You can use GP1 if you don't want to post your data. That's to give you a specifc solution.

One problem to note is that scale_size_manual has no effect when the size has been specified as a fixed value in the geom_ call. For example, for points, geom_point(size=5) will use the default data and aesthetic mapping, but also set the size of ALL points it's plotting to be size 5, even if size was mapped to some variable in the inherited aes aesthetic. In the biplot for example, I believe I've set the sample sizes to some value, like 5, so no additional scale_size calls will change that. A general work-around that should always work (but may not be the most efficient code-wise, depending on what you're doing) is to start a brand-new ggplot2 object definition grabbing just the data from the one created by plot_ordination. Here is what that would look like:

p4 = plot_ordination(GP1, GP.ord, type = "split", color = "Phylum", shape = "human",  label = "SampleType", title = "split")
ggplot(p4$data, aes(...)) + geom_point() + ...
insectnate commented 10 years ago

Thanks so much for the help.

I followed the tutorial for split plot here

p4 = plot_ordination(GP1, GP.ord, type = "split", color = "Phylum", shape = "human", label = "SampleType", title = "split")

Then used the provided code

gg_color_hue <- function(n) { hues = seq(15, 375, length = n + 1) hcl(h = hues, l = 65, c = 100)[1:n] } color.names <- levels(p4$data$Phylum) p4cols <- gg_color_hue(length(color.names)) names(p4cols) <- color.names p4cols["samples"] <- "black" p4_1 = p4 + scale_color_manual(values = p4cols)

To alter the colour of the left side of the plot to black. The sample point/text size in the example is to small to read.

I then tried p4_1 + scale_size_manual(values, c(6,2,1))

Can I use something like this

Adjust size so that samples are bigger than taxa by default.

p <- p + scale_size_manual("type", values=c(samples=5, taxa=2))

from the biplot function of plot_ordination to make my default sample points are bigger in a type = split plot? I see in the plot_ord split code there is

# biplot, id.type must map to color or size. Only color if none specified.
if( is.null(color) ){
 ord_map <- aes_string(x=x, y=y, color="id.type",
    shape=shape, na.rm=TRUE)
} else {
  ord_map <- aes_string(x=x, y=y, size="id.type",
    color=color, shape=shape, na.rm=TRUE)
}

}