strengejacke / sjPlot

sjPlot - Data Visualization for Statistics in Social Science
https://strengejacke.github.io/sjPlot
606 stars 91 forks source link

Are signifcane asterisks reliable when using robust linear models (rlm) in combination with estimate plots (plot_model)? #942

Open eimichae85 opened 3 months ago

eimichae85 commented 3 months ago

Hello, due to some outliers I have to work with robust linear models (using the rlm() function, MASS package). I would then like to plot my model results with the plot_model function (package: sjPlot) as I find it a neat way to visualize results.

The summary of the rlm()-based model only provides t-values but not p-values (or significance asterisks). However, when I subject the rlm()-model to the plot_model function, plot_model() somehow provides significance asterisks. 1) How is this possible? 2) Are these significance levels (asterisks indicating p-value ranges) reliable? if so why would I not get them in the summary output of the rlm() model?

Thank you very much for your feedback Michael

Below is an example code and the data to run it

`library(sjPlot) library(MASS)

DATA

tab<-structure(list(Site = structure(c(2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L), levels = c("Alpthal", "Birmensdorf"), class = "factor", contrasts = "contr.sum"), Treatment = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("control", "4000_100", "4000_30", "4000_50" ), class = "factor"), C50 = c(490139.1569, 210684.3876, 177220.7609, 144855.0927, 143589.8017, 346457.7683, 208220.2951, 763146.691, 337140.0098, 473545.559, 133118.6341, 196323.6835, 324372.0377, 660669.6405, 221500.1217, 130437.8918, 228093.4435, 64947.9752, 98456.18456, 108942.3415, 258502.6415, 38607.72435, 60942.19915, 276165.4695, 193098.7689, 215195.3215, 175787.9858, 200215.9341, 80513.00079, 120383.9266, 127098.0298, 0, 36507.1922, 63337.42105, 119012.168, 136744.0301, 243138.5785, 237136.1503, 143608.6254, 196767.0193, 72837.33976, 37430.74448, 180981.6339, 21085.30347, 62652.19874, 430516.4452, 101975.2863, 286981.5731, 174188.796, 148825.2492, 376042.6975, 0, 361023.8245, 73044.75764, 317343.18, 155962.495)), row.names = c(NA, -56L), class = "data.frame")

Define the factors

tab$Site<-as.factor(tab$Site) tab$Treatment<-as.factor(tab$Treatment)

contrasts(tab$Site) <- "contr.sum" contrasts(tab$Treatment)<- "contr.sum"

tab$Treatment <- relevel(tab$Treatment, ref="control") #only use for graphics, NOT statistics

modRob<-rlm(C50~Treatment*Site,data=tab,psi=psi.huber,method="M",maxit=50) summary(modRob)

plot_model(modRob, vline.color = "red", colors = c("black","black"), sort.est=F,show.values = TRUE, value.offset = .45,value.size = 6,title = "", dot.size=5.3, line.size =0.8 , transform = NULL, order.terms = c(2,3,1,4,6,7,5), auto.label = FALSE) `

strengejacke commented 3 months ago

See discussion here: https://stat.ethz.ch/pipermail/r-help/2006-July/108659.html

I'd say, as long as you're transparent about the approximation method (and therefore, you could try parameters::model_parameters() and look at the output, which should indicate the approximation method), it's ok to compute them. Compare to mixed models, where you have many different methods (wald, satterthwaite, kenward-rogers, ...), none of which one would say it's perfectly accurate - they're all approximations.

eimichae85 commented 3 months ago

Thanks a lot for your swift reply. I was not aware of the possibility to use parameters::model_parameters(model) It works perfectly and gives me indeed p-values. Also the approximation method is mentioned: "p-values (two-tailed) computed using a Wald t-distribution approximation."

I will mention this assuming that this is also what the plot_model() output displays.

Cheers!

strengejacke commented 3 months ago

Yes, plot_model() uses the parameters package to compute the summary tables