plot add data=TRUE not making sense

abadgerw commented 4 months ago

I am working with the attached dataset: Test.csv

I am running the following model:

library(ggeffects)
library(jtools)

df<-read.csv("Test.csv",header=T,row.names=1)

model<-lm(log(Molecule) ~ Volume + Pred1 + Pred2 + Pred3, data=df)

I then try and plot the results which seems to work when not adding points:

ggpredict(model = model, terms = "Volume", back.transform = FALSE) %>%
plot()

Log

ggpredict(model = model, terms = "Volume", back.transform = TRUE) %>%
plot()

No Log

However, when adding the points, I get very wonky results:

ggpredict(model = model, terms = "Volume", back.transform = FALSE) %>%
plot(show_data = TRUE)

Log

ggpredict(model = model, terms = "Volume", back.transform = TRUE) %>%
plot(show_data = TRUE)

No Log

I compared this with effect_plot from the jtools package which can overlay the data points but only for the non-transformed data:

effect_plot(model,pred="Volume",interval=TRUE,plot.points = TRUE,data=df)

Effect_Plot

Any insight into why I can't overlay the data using ggeffects in either the log or backtransformed scales?

abadgerw commented 4 months ago

I also notice the same outputs when using the marginaleffects package and the plot_predictions function once I try and display the data points. Any insights @vincentarelbundock?

strengejacke commented 4 months ago

There are some outliers in the data that causes this weird scaling. Try to add ggplot2::ylim(...) to change the limits of the y axis.

strengejacke commented 4 months ago

Or pass it directly to plot(), see this example: https://strengejacke.github.io/ggeffects/articles/introduction_plotmethod.html#control-y-axis-appearance

abadgerw commented 4 months ago

@strengejacke Thanks! Why do the plots when using back.transform TRUE and FALSE look the same? I was expecting the plot with back.transform FALSE to look like the plot created with jtools without weird scaling since the log transformation shrinks all data points to be in the same range.

strengejacke commented 4 months ago

I'm not sure, I guess back-transforming only affects predictions, not the raw data. I'll look into this.

strengejacke commented 4 months ago

Current state of the development:

library(ggeffects)
data(sleepstudy, package = "lme4")
model <- lme4::lmer(log(Reaction) ~ Days + (1 | Subject), data = sleepstudy)

pr <- ggpredict(model, "Days", back_transform = FALSE)
#> Model has log-transformed response. Predictions are on log-scale.
plot(pr, show_data = TRUE)
#> Data points may overlap. Use the `jitter` argument to add some amount of
#>   random variation to the location of data points and avoid overplotting.


pr <- ggpredict(model, "Days", back_transform = TRUE)
#> Model has log-transformed response. Back-transforming predictions to
#>   original response scale. Standard errors are still on the log-scale.
plot(pr, show_data = TRUE)
#> Data points may overlap. Use the `jitter` argument to add some amount of
#>   random variation to the location of data points and avoid overplotting.

^{Created on 2024-05-13 with reprex v2.1.0}

strengejacke / ggeffects

plot add data=TRUE not making sense #522