Standard function - Githubissues

strengejacke / sjPlot

sjPlot - Data Visualization for Statistics in Social Science

https://strengejacke.github.io/sjPlot

609 stars 93 forks source link

Standard function #94

Closed rubenarslan closed 8 years ago

rubenarslan commented 8 years ago

I use sjPlot to make coefficent and marginal effect plots for model comparisons in my robustness analyses. So I apply it to get coefficients (not plots) to various models, including lme4, nlme, lm, glm and brms.

However, the coef output is not standardised, e.g. sjp.lm yields an object called df with columns such as "Beta" and "lower" while sjp.lmer yields an object called mydf with columns such as "OR" and "lower.CI". This makes rbinding coefficients from different approaches a bit bothersome.

A good counter-example and the reason I'm using sjPlot, not broom, is sjp.int. I can basically throw anything at sjp.int and get a result in the same format. I would love to have such a tidy function for fixed effects of (generalized) linear (mixed) models too, currently I use a function which switches based on the fit's class and then tidies up the resulting df.

I could probably switch to broom for this, or maybe you could build on it too?

sjPlot commented 8 years ago

Yes, this was something in my mind all the time, however, with lower priority. What would be a good and genereic naming convention for data frame columns? If you have any suggestions, please let me know.

rubenarslan commented 8 years ago

I think for coefficients you could probably go with broom's convention, it has served me well and the package seems to be getting a lot of support.

Moreover, by using a generic function and building on broom you could also support models that you haven't considered yet (e.g. stanfit), at least for simple things like coefficient plots.

I don't know so much about the internals of your marginal effect plots, but you use effects and lsmeans a lot, right?

sjPlot commented 8 years ago

Yes, and one major initial intention to wrtie the sjPlot package was to tidy up data or fitted models to prepare for plotting and minimize effort with ggplot-syntax - a kind of combination or integration of broom and ggplot. ;-)

sjPlot commented 8 years ago

A quick and easy way would be setting the column names just before returning the data frame from the function. This would require no internal changes and would be a safe method.

rubenarslan commented 8 years ago

Yeah that is how I currently quick-fix it, it's just a bit ugly.

get_coefs = function(fit, model_name) {
if (class(fit) == "lm") {
      obj_coef = tryCatch({sjp.lm(fit, printPlot = F)$df }, error = function(e) { cat_message(e, "danger") })
      obj_coef$pv = NULL
      names(obj_coef) = c("x","OR", "lower.CI", "upper.CI", "p")
      obj_coef$grp = obj_coef$sorting = obj_coef$fade =  NA
      obj_coef = obj_coef[, c("OR", "lower.CI", "upper.CI", "p", "grp", "sorting", "x", "fade", "Window", "Predictor")]
      obj_coef
    }
    else {
      obj_coef = tryCatch({suppressMessages(suppressWarnings(sjp.lmer(fit, type = 'fe', showIntercept = F, printPlot = F)$mydf)) }, error = function(e) { cat_message(e, "danger") })
      obj_coef
  }
}

It's not urgent for me, and actually in this case (where I'm suppressing the plots anyway), I could just as well use broom. I just thought it might make sense in the long run to switch to broom for you too, so you can focus more on the automagic plots.

sjPlot commented 8 years ago

I will also harmonize the return value name to data and add a class attribute sjPlot to all functions that return a "harmonized" data frame. Column order may vary, though.

sjPlot commented 8 years ago

I don't know so much about the internals of your marginal effect plots, but you use effects and lsmeans a lot, right?

Yes, why did you ask? Is there an easy way of supporting more model types in combination with broom?

rubenarslan commented 8 years ago

I don't know. There is broom::augment(fit, newdata = newdata), but I don't know exactly how powerful it is. It has methods for lme, merMod, lm and glm at least, but not all give standard errors or CI.

But generating the newdata for marginal effects is also a task that has been daunting for me in the past.