tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.5k stars 2.02k forks source link

In facet_wrap, multiple geom_text layers not commutative under new R #1397

Closed arthur-e closed 8 years ago

arthur-e commented 8 years ago

I recently re-ran an old script that uses ggplot2 to create boxplots and found that, without having changed anything in the code, the output is different. Unfortunately, I can't recall the initial version of R or ggplot2 that I had working, but I do have a clear, reproducible example of it not working:

library(ggplot2)
library(grid)
library(reshape2)

options(stringsAsFactors=FALSE)

# Generating synthetic data here
tpl <- c('1st', '2nd', '3rd', '4th', '5th')
dat <- data.frame(foo=as.factor(sample(tpl, 1000, replace=TRUE)),
  bar=as.factor(sample(tpl, 1000, replace=TRUE)),
  effect=runif(1000, 0.1, 0.7))

# Just doing a cross-tabulation
ctab <- melt(table(subset(dat, select=c('foo', 'bar'))), id.vars='foo')
ctab$y <- rep(0.8, dim(ctab)[1])

# Just conducting ANOVA tests here
tests <- c()
for (q in levels(dat$bar)) {
  test <- aov(effect ~ foo, data=subset(dat, bar == q))
  tests <- c(tests, sprintf('p-value: ~%.4f', summary(test)[[1]][['Pr(>F)']][[1]]))
}
tests <- data.frame(p.value=tests, bar=levels(dat$bar),
                    x=rep(1, 5), y=rep(0, 5))

ggplot(dat, mapping=aes(y=effect)) +
  geom_boxplot(mapping=aes(x=foo)) +
  geom_text(data=tests, aes(x=x, y=y, label=p.value), hjust=0.1, vjust=0.1) +
  geom_text(data=ctab, aes(x=foo, y=y, label=value), vjust=0.7) +
  xlab('2000 Census White Pop. Proportion Quintile') +
  ylab('Vegetation Cover Proportion') +
  labs(title='Vegetation Cover by 2000 Census Tract, Pop. Density Quintiles') +
  facet_wrap(~ bar) +
  theme_bw() +
  theme(text=element_text(size=16),
        plot.margin=unit(c(0.5, 0.2, 0.5, 0), 'cm'),
        panel.grid.major.y=element_line(color='gray'),
        panel.grid.major.x=element_blank())

The problem is in the ggplot geom_text layers. As it is (above), I get 10 facets instead of 5; there are only 5 levels to the faceting variable (bar) so I don't know why I see 10 facets, particularly as the extraneous facets are completely empty (see image).

rplot

If I transpose (reorder) the two geom_text layers I get the correct result: 5 facets, not 10. This switch is seen in the code below:

ggplot(dat, mapping=aes(y=effect)) +
  geom_boxplot(mapping=aes(x=foo)) +
  geom_text(data=ctab, aes(x=foo, y=y, label=value), vjust=0.7) +
  geom_text(data=tests, aes(x=x, y=y, label=p.value), hjust=0.1, vjust=0.1) +
  xlab('2000 Census White Pop. Proportion Quintile') +
  ylab('Vegetation Cover Proportion') +
  labs(title='Vegetation Cover by 2000 Census Tract, Pop. Density Quintiles') +
  facet_wrap(~ bar) +
  theme_bw() +
  theme(text=element_text(size=16),
        plot.margin=unit(c(0.5, 0.2, 0.5, 0), 'cm'),
        panel.grid.major.y=element_line(color='gray'),
        panel.grid.major.x=element_blank())

This problem arises in R version 3.2.2 (2015-08-14), Platform: x86_64-pc-linux-gnu (64-bit), Running under: Ubuntu 14.04.3 LTS with ggplot2 version 1.0.1.

This problem does NOT arise in R version 3.2.0 (2015-04-16), Platform: x86_64-apple-darwin13.4.0 (64-bit), Running under: OS X 10.10.5 (Yosemite), also with ggplot version 1.0.1. I also did not have this problem on Ubuntu Linux before upgrading R (and ggplot); I can't imagine its specific to the platform.

So, I guess the operative question is what changed in R since 3.2.0 to cause these layers to no longer commute?

As context, I am using ggplot2 to produce multiple boxplots--quintiles of a Census variable within quintiles of another Census variable compared to the mean proportion of vegetation cover in a Census tract. The geom_text layers display the counts in each boxplot and the p-values.

hadley commented 8 years ago

This sort of problem is usually caused by a mismatch in factor levels - can you check ctests$bar and data$bar?

hadley commented 8 years ago

Seems to be resolved in dev version.