Open LGraz opened 4 months ago
I would not recommend se = "robust.sem"
here, as you define new variables that are clearly nonlinear functions of the original parameters.
I also hesitate to just replace 'sd' by 'mad' while calculating the bootstrap-based standard errors. Surely, sd() is very sensitive to outliers, but if an outlier effectively occurs every now and then, then this should also be reflected in the standard errors. I would have to dive into the literature to better understand the theoretical implications if mad() is used instead of sd() while computing bootstrap based standard errors.
I would rather try to pinpoint when/why you often get outlying parameter estimates. Perhaps using bounded estimation may prevent this? (Try running this with the argument bounds = "standard"
).
bounds = "standard"
does not have an effect.
pinpoint why outlying parameter:
or_fraction_mediated := or_mediated/or_total
so if or_total
is very close to 0 in one sample, then or_fraction_mediated
explodes and hence also sd
breaks down.
Simple exchanging sd() with mad() is not advisable.
What do you think about a warning if the sd and the mad are far away from each other? like:
# BOOT <- lavaan:::lav_object_inspect_boot(object)
# BOOT.def <- apply(BOOT, 1, object@Model@def.function)
sd_mad_ratio <- apply(BOOT.def, 1, sd) / apply(BOOT.def, 1, mad)
if(any(5 > sd_mad_ratio)){
params_w_outliers <- names(sd_mad_ratio)[5 < sd_mad_ratio]
warning(paste("The following bootstrap-parameters have a high
ratio of standard deviation to median absolute deviation:",
params_w_outliers,
"\nP-values and Confidence Intervals might not match."))
}
Dear Team, I appreciate this great product, THANKS!
I encountered a case where the confidence interval of
or_fraction_mediated
does not include0
, but the p-value is 0.9I noticed that the SE estimate from the bootstrap is corrupted by outlier from the bootstraps (Hence the p-value might be too big). If thats so, I suggest using some more robust version (below I compare the SE with the MAD).
Here my minimal reproducible example using the data_WV_scaled.csv :
Unrelated to this robustness issue: Would you recommend to me using
se = "robust.sem"
instead?