pbreheny / visreg

Visualization of regression functions
http://pbreheny.github.io/visreg/
61 stars 18 forks source link

Data extraction by setupF() fails in certain case for lm() fits #74

Closed courtiol closed 4 years ago

courtiol commented 4 years ago

Here is a RME of a problem I detected: visreg() calls setupF() to extract the data from the model fit, but under lm() (in the context of how visreg() is called below), it fails to understand that lm() does not contain such object...

library(tidyverse)
library(visreg)

## batch job with glm() is working:
iris %>%
  group_nest(Species) %>%
  mutate(fit  = map(data, ~ glm(Petal.Length ~ Sepal.Length, data = .x)),
         plot = pmap(list(fit, Species), ~ visreg(..1, gg = TRUE) + labs(title = ..2))) -> res
walk(res$plot, print)


## same with lm() is not working:
iris %>%
  group_nest(Species) %>%
  mutate(fit  = map(data, ~ lm(Petal.Length ~ Sepal.Length, data = .x)),
         plot = pmap(list(fit, Species), ~ visreg(..1, gg = TRUE) + labs(title = ..2))) -> res2
#> Error in as.data.frame.default(data): cannot coerce class '"lm"' to a data.frame
walk(res2$plot, print)
#> Error in map(.x, .f, ...): object 'res2' not found

Created on 2019-10-28 by the reprex package (v0.3.0)

pbreheny commented 4 years ago

The underlying issue was that there was a data set, .x, constructed by map() and used to fit the lm(). However, when visreg() is later called by pmap(), the symbol .x meant something completely different. visreg() needs the data frame used to fit the model, so it was looking for the object in the calling environment -- works 99 times out of 100, but failed here.

Anyway, this is fixed now -- thanks for bringing it to my attention!

pbreheny commented 4 years ago

Oops -- spoke too soon!

Actually, this is just fundamentally not going to work. lm(), as written, is not compatible with this style of functional programming. In particular, the fitted lm model object does not contain the original data set, which is necessary in order to predict the model fit at newly constructed observations (the essence of what visreg plots). To accomplish what you're trying to do, you'd have to write a wrapper to lm(), say, my_lm(), that calls lm to fit the model, then attaches the original data set, something like this:

my_lm <- function(form, data) {
  obj <- lm(form, data)
  obj$data <- data
  return(obj)
}

That's something like pseudo-code -- I haven't tested it, but it should work. Let me know if it does/doesn't.

Regardless, I'm leaving the issue closed in the sense that there's nothing I can do from within visreg to address this issue -- lm() doesn't support what you're trying to do.

pbreheny commented 4 years ago

Just to clarify, this issue was fixed by commit f2033fe, but this breaks everything else, so the commit was undone in 3529fc0.

courtiol commented 4 years ago

Hi @pbreheny, thanks for having looked at the issue. I do think it can be fixed: a lm fit does the data stored in the component called model.

pbreheny commented 4 years ago

Unfortunately, what is stored in fit$model is not the original data frame:

fit <- lm(Ozone ~ poly(Wind, 2), airquality)
head(fit$model)
#>   Ozone poly(Wind, 2).1 poly(Wind, 2).2
#> 1    41     -0.05888217     -0.01266049
#> 2    36     -0.04506826     -0.02878595
#> 3    12      0.06083839     -0.04011085
#> 4    18      0.03551289     -0.05547835
#> 6    28      0.11379171      0.02873584
#> 7    23     -0.03125435     -0.04153103

There is no way to recover the original Wind variable from this.

courtiol commented 4 years ago

There is a way (no information is when using poly), but I agree that it is a little tricky and impossible to come up with a general recipe for any function that may be applied on the data as some of them will trigger information loss...

Perhaps one possible way would be to look for the variable and if it is not found in $model then you could call an error and write a message that for this to work transformation on the data should be done before calling the fitting procedure. That way, it would still work for all fit with no transformation.

Personally, I could keep using glm() under its default setting as it is equivalent to lm() by that does contain $data and thus work. This workaround could also be what an error message suggests.

Up to you.

IMO, at minimum, there should be a more transparent error message to let the user know why visreg crashes in the case of lm() + {dplyr}.