strengejacke / sjPlot

sjPlot - Data Visualization for Statistics in Social Science
https://strengejacke.github.io/sjPlot
609 stars 93 forks source link

Why does sjp.glmer not plot data points? #147

Closed phragmosis closed 7 years ago

phragmosis commented 8 years ago

Hi,

This is a brilliant package.

To make this literally perfect, I am trying to plot a publishable-quality figure for a paper, and need the data points (equivalent to the geom_points() function in ggplot2) to be included along with the predicted line with a Poisson GLMM, using sjp.glmer(). This is a reviewers request.

Data points are automatically included with sjp.lmer() (e.g. from sjp.lmer(fit2, type = "fe.slope", vars = c("c12hour", "barthel")) found at http://www.strengejacke.de/sjPlot/sjp.lmer/, but I don't understand why the same is not done with something like sjp.glmer(fit,type="fe.slope",vars="b")) where "fit" could be from the hypothetical lme4 model: mod<-glmer(a~b+(1|c),poisson,data=dataframe?

Is this an issue with the package, or am I doing something daft?

Rob

robert1707 commented 7 years ago

Hi,

I have exactly the same request for a paper I will be submitting shortly. I was wondering if you found a solution to this problem?

Thanks, Robert

sjPlot commented 7 years ago

The problem is, that the raw data for linear mixed models are on the same scale (y-axis) as the linear "trend line" going through the data-point-cloud:

test

For generalized linear (mixed) models, the plot produces the predicted probabilities, while the raw data points either have the value 0 or 1 on the y-axis. I'm not sure if this is a problem.

Taking an example from here: http://strengejacke.de/sjPlot/sjp.glmer/ The following image, with predicted probs:

test

Now, if I would plot the raw data, you either need jittering or adjusted alpha to cope with overplotting (note, this is just a quick plot only to demonstrate the raw data-points, don't wonder about the fitted trend-line):

rplot

If you have any idea how to implement such a feature in a way that it does make sense, I'm happy to try to include it.

phragmosis commented 7 years ago

Hi Robert,

No I didn't find a solution to this and ended up having to use ggplot2 with predicted lines.

Hi SjPlot, Sorry, I'm not too sure I understand the issue. You can of course have Poisson, Gamma etc family models for generalised linear (mixed) models as well as binomial, where the issue is surely the same as with the linear model (in terms of overplotting), where you are already using adjusted alpha right? Either way, I'm not too sure why a binomial model creates a fundamentally different problem in relation to this. I think either approach (jittering or adjusted alpha) would be great, or make the point size scale with the number of observations at that position to avoid overplotting.

Many researchers in my field like to see where the data sits in relation to the predicted line so they can judge about certainty at different levels of the plotted relationship etc. I really think that if you implemented something to allow data points to be plotted along with the predicted model-based relationship then your package would be hard to beat for making publishable quality figures quickly.

sjPlot commented 7 years ago

Ok, I see the point, and with a narrower jittering, the plot even looks good and useful. I checked this feature with logistic and poisson models, both type slope and pred. Seems to work fine... Also, when scatter-plot is used (data points added), the y-scale-limit are adjusted accordingly (mostly relevant for non-binary outcomes).

robert1707 commented 7 years ago

Thank you both for your useful comments!