tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.54k stars 2.03k forks source link

geom_smooth throws error with diamonds dataset #2621

Closed thomasp85 closed 6 years ago

thomasp85 commented 6 years ago

This is a weird error I ran into:

ggplot(diamonds, aes(x = carat, y = price)) + geom_smooth()
#> `geom_smooth()` using method = 'gam' and formula 'y ~ mgcv::s(x, bs = "cs")'
#> Warning message:
#> Computation failed in `stat_smooth()`:
#> invalid type (list) for variable 'mgcv::s(x, bs = "cs")' 

At first I thought it was because diamonds were a nibble but converting to a data.frame preserves the error...

This works on the CRAN version so it is a regression introduced somewhere. Strangely enough, it is tied to something in the diamonds dataset as geom_smooth works fine with e.g. mtcars

clauswilke commented 6 years ago

The problem is not tied to diamonds, and it arises specifically when using splines with the GAM model. It must be a recent regression, because this definitely worked 1-2 months ago.

library(ggplot2)
ggplot(mtcars, aes(x = hp, y = mpg)) + geom_smooth(method = "loess") # loess is default for small datasets

ggplot(mtcars, aes(x = hp, y = mpg)) + geom_smooth(method = "gam", formula = y ~ x)

ggplot(mtcars, aes(x = hp, y = mpg)) + geom_smooth(method = "gam", formula = y ~ mgcv::s(x, bs = "cs"))
#> Warning: Computation failed in `stat_smooth()`:
#> invalid type (list) for variable 'mgcv::s(x, bs = "cs")'

Created on 2018-05-17 by the reprex package (v0.2.0).

clauswilke commented 6 years ago

Also, one more comment: I added a regression test for geom_smooth() but it only tests the loess method. I think we need to add a regression test for the GAM method as well. This would have been a pretty disastrous bug to have in a released ggplot2.

hadley commented 6 years ago

It seems to be the namespace that causes the problem: this is ok

ggplot(mtcars, aes(x = hp, y = mpg)) + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs"))
hadley commented 6 years ago

Minimal reprex

mgcv::gam(mpg ~ mgcv::s(wt), data = mtcars)
#> Error in model.frame.default(formula = mpg ~ mgcv::s(wt), data = mtcars, : invalid type (list) for variable 'mgcv::s(wt)'

This was introduced in #2535, cc @bfgray3.

clauswilke commented 6 years ago

Yes, correct. Reprex outside of ggplot2:

library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-23. For overview type 'help("mgcv-package")'.
x <- rnorm(50)
y <- x^2 + 0.3*rnorm(50)
df <- data.frame(x, y)
gam(y ~ s(x), data = df)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> y ~ s(x)
#> 
#> Estimated degrees of freedom:
#> 5.89  total = 6.89 
#> 
#> GCV score: 0.1011416
gam(y ~ mgcv::s(x), data = df)
#> Error in model.frame.default(formula = y ~ mgcv::s(x), data = df, drop.unused.levels = TRUE): invalid type (list) for variable 'mgcv::s(x)'

Created on 2018-05-17 by the reprex package (v0.2.0).

hadley commented 6 years ago

So we just need to revert that change including the news bullet, and add a test.

I have a dim recollection that we've done this dance before.

bfgray3 commented 6 years ago

Apologies for introducing this bug.

clauswilke commented 6 years ago

I think this raises two questions, though: 1. Why did it work when @bfgray3 proposed the change originally? 2. Is there a reliable way to fix #2535?

clauswilke commented 6 years ago

I did some more digging. gam() looks specifically for a term called s (as well as te, ti, and a few others), so it doesn't recognize mgcv::s:

# works
s <- mgcv::s
mgcv::gam(mpg ~ s(wt), data = mtcars)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> mpg ~ s(wt)
#> 
#> Estimated degrees of freedom:
#> 2.14  total = 3.14 
#> 
#> GCV score: 7.887675

# doesn't work
u <- mgcv::s
mgcv::gam(mpg ~ u(wt), data = mtcars)
#> Error in model.frame.default(formula = mpg ~ u(wt), data = mtcars, drop.unused.levels = TRUE): invalid type (list) for variable 'u(wt)'

# works (te is a special term gam recognizes)
te <- mgcv::s
mgcv::gam(mpg ~ te(wt), data = mtcars)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> mpg ~ te(wt)
#> 
#> Estimated degrees of freedom:
#> 2.14  total = 3.14 
#> 
#> GCV score: 7.887675

Created on 2018-05-17 by the reprex package (v0.2.0).

One solution might be to create a custom environment in which s <- mgcv::s and make sure the gam expression is evaluated in that environment. That exceeds my R abilities, though. The other solution would be to somehow fix this in the mgcv package.

hadley commented 6 years ago

For now I think we should forward to gam maintainer.

clauswilke commented 6 years ago

I have contacted the gam maintainer.

lock[bot] commented 6 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/