jacob-long / jtools

Tools for summarizing/visualizing regressions and other helpful stuff
https://jtools.jacob-long.com
GNU General Public License v3.0
162 stars 22 forks source link

Backtransform Outcome in Effect_Plot #151

Closed abadgerw closed 1 month ago

abadgerw commented 4 months ago

I am working with the attached dataset: Test.csv

I am running the following model:

library(ggeffects)
library(jtools)

df<-read.csv("Test.csv",header=T,row.names=1)

model<-lm(log(Molecule) ~ Volume + Pred1 + Pred2 + Pred3, data=df)

I tried plotting the results using effect_plot:

effect_plot(model,pred="Volume",interval=TRUE,plot.points = TRUE,data=df)

Effect_Plot

However, I can't seem to get the plot to display the native backtransformed y-axis, only the logged values. Any way that can be done?

I know the ggeffects package can display the backtransformed y-axis but it seems to have issues plotting the points as your package can (see issue here: https://github.com/strengejacke/ggeffects/issues/522)

jacob-long commented 1 month ago

I think the best answer for now is that there's no simple way to achieve this --- I worked to make sure that the function doesn't break when there are transformations in the formula, but the package isn't currently setup to make it easy for the user to choose whether a formula-transformed outcome can be plotted on the original vs. transformed scale.

You're not left totally high and dry, though, since we can at least get you the raw data to feed to ggplot2 via the make_predictions() function.

preds <- make_predictions(model, pred = "Volume", interval = TRUE, plot.points = TRUE, data = df)

Note that preds will have a column called `log(Molecule)` which contains the model predictions as well as another column called Molecule that has been automatically filled with the mean of Molecule as if it's a control variable. So we will replace that with back-transformed model predictions like so:

preds$Molecule <- exp(preds$`log(Molecule)`)

With that, we can start plotting the regression line:

plot <- ggplot(preds, aes(x = Volume, y = Molecule)) + geom_path() + theme_nice()

image

Then we can add the original data back...

plot + geom_point(data = df)

image

Now one issue as far as the intelligibility of the plot with your data is you have one enormous outlier (Molecule = 5611956) and one other observation that stretches the axis a good bit (Molecule = 883766). So you could choose to squeeze the axis to exclude them, although I don't know if that will be ultimately misleading:

plot + geom_point(data = df) + ylim(0, 200000)

image

The axis can be further adjusted as needed.