Colors not showing correctly when specifiying "show_data = T" in the plot method of ggpredict object

strengejacke / ggeffects

Estimated Marginal Means and Marginal Effects from Regression Models for ggplot2

Other

553 stars 35 forks source link

library(ggeffects) library(splines) data(efc) fit <- lm(barthtot ~ c12hour * c161sex + e42dep, data = efc) pred<-ggpredict(fit, terms = c("c161sex","c12hour [4,35,77,168]")) plot(pred,show_data = T, color=c("purple","green","blue","red")) # here the colors do not match the input colors - for second and third color plot(pred,show_data = F, color=c("purple","green","blue","red")) # here the colors are as expected

This one is tricky, indeed. If the 2nd variable in terms is continuous, you may have many more values in the data than shown in the "grouped" predictions. In your example, you see predicted values for the values 4, 35, 77 and 168 of c12hour. However, the raw data for c12hour contains much more different values, and thus, the dots receive a gradient color, depending on how "close" the dots (i.e. the data values) are to the requested values (4, 35, 77 and 168). Thus, your provided color scale is passed to ggplot2::scale_color_gradient(), and therefore, the colors look different from their original color codes. If you don't show data points, there's no need for gradient color scale, and thus, colors are perfectly matching. Same when you have categorical variables as 2nd term. Since all categories are present in the data, colors will perfectly match.

library(ggeffects)
data(efc)
efc <- datawizard::to_factor(efc, c("e42dep", "c161sex"))
fit <- lm(barthtot ~ c161sex * e42dep, data = efc)
pred <- ggpredict(fit, terms = c("c161sex", "e42dep"))

plot(pred, color = c("purple", "green", "blue", "red"))

plot(pred, show_data = TRUE, jitter = TRUE, color = c("purple", "green", "blue", "red"))

^{Created on 2023-11-21 with reprex v2.0.2}

strengejacke / ggeffects

Colors not showing correctly when specifiying "show_data = T" in the plot method of ggpredict object #404