pbreheny / visreg

Visualization of regression functions
http://pbreheny.github.io/visreg/
61 stars 18 forks source link

visreg and quantreg #16

Closed ghost closed 7 years ago

ghost commented 8 years ago

Hello,

I'm trying to plot visreg plots for quantile regressions fitted with quantreg.

t1=rq(y1~x1+x2+x3+HII+catvar, tau=c(0.25,0.5,0.75), data=d) visreg(fit=t1,xvar="y1")

I get this error message:

Warning messages: 1: In Response(fit, x, trans, alpha, ...) : Residuals do not match data; have you changed the original data set? If so, visreg is probably not displaying the residuals for the data set that was actually used to fit the model.

The plot is obviously false (5 regression lines instead of 3). I don't get what I'm doing wrong?

Thanks for any help,

Best regards

JYB

pbreheny commented 8 years ago

What version of visreg are you using (packageVersion("visreg")? The release of 2.2-1 should have fixed the issues with quantreg, but it's possible that there are still unresolved issues.

ghost commented 8 years ago

Hi, I'm using 2.2.1 - how can I help you figuring out the issue?

pbreheny commented 8 years ago

Does it work if you leave off the tau option, or if you specify only a single value of tau? To my knowledge, I've only tested whether visreg works to reproduce a single quantile estimate at a time. Some additional code might be necessary to support a vector of tau values.

ghost commented 8 years ago

it actually works with a single "tau" value.

Would it be hard to add a patch of code allowing to plot several lines coming from several different tau on a same plot? It's still possible to make one panel per tau but the visual reading is not as easy.

pbreheny commented 8 years ago

I'm going to label this issue as an "enhancement" of visreg and leave it open. I agree that it would be nice to produce an overlay plot of multiple quantile lines, but it isn't clear to me right now how much code would be necessary for this feature. I'm working on some other things at the moment, but it's on my schedule to work on visreg next week, so I hope to address this issue at that time -- thanks for bringing it to my attention.

ghost commented 8 years ago

great, thanks. Please let me know if you come up with a solution if you work on it in the incoming weeks.

ghost commented 8 years ago

Dear Dr Breheny,

Did you find by any chance a bit of time to look for the way to implement multiple quantile lines for quantile regressions in visreg?

Many thanks,

Best regards

JY Barnagaud

On 06/06/2016 17:31, Patrick Breheny wrote:

I'm going to label this issue as an "enhancement" of |visreg| and leave it open. I agree that it would be nice to produce an overlay plot of multiple quantile lines, but it isn't clear to me right now how much code would be necessary for this feature. I'm working on some other things at the moment, but it's on my schedule to work on |visreg| next week, so I hope to address this issue at that time -- thanks for bringing it to my attention.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pbreheny/visreg/issues/16#issuecomment-223995321, or mute the thread https://github.com/notifications/unsubscribe/AS3TBT_F7Q1lu5zZpvhXOVF8gIKn3Xfgks5qJD1qgaJpZM4Iu_SM.

Jean-Yves Barnagaud Vertebrate Biogeography & Ecology group EPHE - CEFE - Montpellier, Fr. +0033(0)467613326 http://sites.google.com/site/jybarnagaud

pbreheny commented 8 years ago

I was traveling for a week; I had hoped to look into this before I left, but too many other commitments. I'm still hoping to get to this soon, but it's looking like next week or so.

pbreheny commented 7 years ago

This issue has been fixed by commit 1a526c455c0fc13424ae97cf123a09f9c2f877c4

Long story: This is part of a larger issue: how should visreg handle objects that return matrices as predictors? This also comes up in multiple outcome regression models ("mlm") and multinomial regression (#18). The existing infrastructure in visreg is to return a list of visreg objects in this case, one for each outcome. In the case of quantile regression, this isn't exactly what we want, since they aren't separate outcomes exactly. Specifically, we'd probably like the ability to overlay them, which means collapsing them from a list into a single visreg object, then plotting.

Short story: This works now, but you'll have to specify collapse=TRUE. See enhances-quantreg.R for details and a working, reproducible example.

Additional details: As the R file illustrates, you can't combine the specification of multiple quantiles with the return of standard errors -- this is a quantreg issue, not anything to do with visreg. Still, there is a workaround, as illustrated in the file.

I'm going to close the issue, but feel free to reply to let me know how this is working for you.

ghost commented 7 years ago

Dear Dr Breheny,

Thanks a lot for the hard work, really helpful!

1: I'm able to reproduce the first example although the "Multiple

quantile overlay" example comes with a warning: I guess this is due to the way you adapted visreg for this specific purpose. What does it mean for the displayed residuals on the plots, are they actually inadequate?

2: I can't reproduce the plot with confidence bands: it generates the

following message: "Error: cannot find function "visregList". Has it been correctly exported to the package? (Note: I've reinstalled package visreg this morning before trying).

3 I've got two additional, more general and basic question:

3a Is one of the two "type" options recommended to plot lines and

partial residuals in mixed models (fitted by lme or lmer for instance)? Reading at the help I'm not quite sure about their implications in terms of interpretation, especially as I've not seen changes in the relative positions of points on plots comparing the two "types" (with lmer models).

3b In several instances I've noted that the point cloud does not really

correspond to the plotted lines. See the below example (lmer with type="contrast", same with "conditional" except that the ci are not plotted), the Vosges panel for instance. Not sure about whether these regression lines are trustable then, although the ci of slopes don't encompass 0. Is this a matter of the way points are plotted or is it a statistical question related to the models themselves?

Thanks again,

Best regards

JYB

On 19/07/2016 20:24, Patrick Breheny wrote:

This issue has been fixed by commit 1a526c4 https://github.com/pbreheny/visreg/commit/1a526c455c0fc13424ae97cf123a09f9c2f877c4

Long story: This is part of a larger issue: how should visreg handle objects that return matrices as predictors? This also comes up in multiple outcome regression models (|"mlm"|) and multinomial regression (#18 https://github.com/pbreheny/visreg/issues/18). The existing infrastructure in |visreg| is to return a list of visreg objects in this case, one for each outcome. In the case of quantile regression, this isn't exactly what we want, since they aren't separate outcomes exactly. Specifically, we'd probably like the ability to overlay them, which means collapsing them from a list into a single visreg object, then plotting.

Short story: This works now, but you'll have to specify |collapse=TRUE|. See enhances-quantreg.R https://github.com/pbreheny/visreg/blob/master/inst/tests/enhances-quantreg.R for details and a working, reproducible example.

Additional details: As the R file illustrates, you can't combine the specification of multiple quantiles with the return of standard errors -- this is a |quantreg| issue, not anything to do with |visreg|. Still, there is a workaround, as illustrated in the file.

I'm going to close the issue, but feel free to reply to let me know how this is working for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pbreheny/visreg/issues/16#issuecomment-233722362, or mute the thread https://github.com/notifications/unsubscribe-auth/AS3TBcUcoPpTP2ZO-Dw6BihN321q8oTkks5qXRZ4gaJpZM4Iu_SM.

Jean-Yves Barnagaud Vertebrate Biogeography & Ecology group EPHE - CEFE - Montpellier, Fr. +0033(0)467613326 http://sites.google.com/site/jybarnagaud

pbreheny commented 7 years ago
install_github("pbreheny/visreg")

The new version isn't on CRAN yet, so install.packages will still install the old version.

ghost commented 7 years ago

3b: here's the example

image1

pbreheny commented 7 years ago
  1. This may not be the case, but given that the line is sort of thick and the points pretty tiny, I wonder if there are any hidden points (hiding behind the fitted regression line) in the above figure. If you make the dots bigger, are there perhaps some influential observations at high/low "v4" values that might be influencing the fit?
  2. I agree with you that the cloud of partial residuals doesn't seem to match the line very well in several of the above panels. My view on this is that we should always be asking, "Can I really trust my model's predictions?" One of the purposes of visreg (and other graphical diagnostic plots) is to highlight/illustrate cases in which the model's predictions might not be matching the data very well. In this particular example, I don't know what model is being fit, what the outcome is, or what the data look like, so I can't say whether there is actually something wrong or not, but I would argue that it's healthy to be worried here and to spend some time investigating. You'll either find out that there was a reasonable explanation all along (in which case you've gained a deeper understanding of the model), or you'll find that there is a problem with the model and you need to change it. I suppose a third possibility is that visreg produces misleading plots for some types of models...I certainly hope that this isn't the case, but if it is, I would definitely like to hear more about it.
ghost commented 7 years ago

Thanks for the quick reply;

Just to go a bit deeper into 2.

The models are lmer fitted with ML (not REML) that take a log-transformed count response variable roughly normally distributed and 5 continuous variables scaled to mean=0 and SD=1 all in interaction with a categorical variable with 9 levels.

There's a random effect on the intercept and a weighting corresponding to heterogeneities in sampling effort.

There are over 2000 data in total so I think the model should not be overparametrized...

We've tested the need to incorporate the interaction term and the variable for which I sent you the visreg plots with AIC, and models were clearly better (improvement by over 10 AIC units). The fit is quite good

  • adjusted R² range from 12% to 20% according to models, which is completely standard for this kind of data, and confidence intervals don't encompass 0. Residuals look good with no apparent heteroskedasticity.

I don't really see so far what kind of supplementary checks we could do to ensure that the linear effects shown by visreg are not spurious. I'm balanced between trusting the slope given all the checks we've done, and accepting that residual points are more complicated to interprete than they actually look; or thinking that the model raises spurious estimates although it can't be detected through the checks we've made.

Well, this is a very quick answer but if you have good ideas I'd be really keen to make further checks.

Best,

JYB

On 20/07/2016 17:56, Patrick Breheny wrote:

  1. This may not be the case, but given that the line is sort of thick and the points pretty tiny, I wonder if there are any hidden points (hiding behind the fitted regression line) in the above figure. If you make the dots bigger, are there perhaps some influential observations at high/low "v4" values that might be influencing the fit?
  2. I agree with you that the cloud of partial residuals doesn't seem to match the line very well in several of the above panels. My view on this is that we should always be asking, "Can I really trust my model's predictions?" One of the purposes of visreg (and other graphical diagnostic plots) is to highlight/illustrate cases in which the model's predictions might not be matching the data very well. In this particular example, I don't know what model is being fit, what the outcome is, or what the data look like, so I can't say whether there is actually something wrong or not, but I would argue that it's healthy to be worried here and to spend some time investigating. You'll either find out that there was a reasonable explanation all along (in which case you've gained a deeper understanding of the model), or you'll find that there is a problem with the model and you need to change it. I suppose a third possibility is that visreg produces misleading plots for some types of models...I certainly hope that this isn't the case, but if it is, I would definitely like to hear more about it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pbreheny/visreg/issues/16#issuecomment-233994516, or mute the thread https://github.com/notifications/unsubscribe-auth/AS3TBbml_ayi6IL3l7PK72u5iIv_46mlks5qXkVIgaJpZM4Iu_SM.

Jean-Yves Barnagaud Vertebrate Biogeography & Ecology group EPHE - CEFE - Montpellier, Fr. +0033(0)467613326 http://sites.google.com/site/jybarnagaud

pbreheny commented 7 years ago

Two ideas:

  1. What happens if you stratify the sample and fit a model to only the data in, for example, the "Vosges" group? Does the slope appear similar to how it looks in the more complex analysis?
  2. If the observations have different weights, then not all residuals affect the line equally, and might explain the visual discrepancy. You could try making observations with more weight show up bigger in the plot. Here's an example:
w <- rexp(20)
x <- runif(20)
y <- rnorm(20)
fit <- lm(y~x, weights=w)
visreg(fit, points.par=list(cex=w))
ghost commented 7 years ago

Thanks for it again

  1. The slope looks similar and the points of the residuals are spread more logically along the slope (image below). Not sure of what to conclude however: the effect looks sustained but what about the validity of the visreg points? It roughly looks like in the full model (previous panel) the points were rotated with a 90° angle as compared to the slope.
  2. The display trick works but in this precise situation does not look very informative on why the pattern arises. Again it's exactly like the residual points were rotated for any reason??

image

pbreheny commented 7 years ago

It's difficult to offer too much commentary here without understanding the data or the model you're fitting, but my impression is that you have a (barely) significant slope for, e.g., Vosges in the full model, but if you look at the Vosges data alone, the slope is in the same direction, but not significant. All this would leave me a little skeptical of whether this relationship is real -- you don't see much evidence of it with a simpler model, and the residuals perhaps don't seem quite right (although they don't seem obviously unreasonable to me). In this scenario (with the caveat that I don't know a great deal about the context), I would probably report the full model any interesting associations that come out of it, but be careful not to overstate the amount of evidence (it appears borderline significant and perhaps not all that robust to choice of model).

pbreheny commented 7 years ago

As a visreg issue, though, I don't see any clear evidence that the software is doing anything wrong -- as far as I can tell, the plots faithfully represent what is going on in the data and model.

ghost commented 7 years ago

Thanks for the replies. Good point to know that strange patterns are not due to the software.

Well, the model with the effect is >40 AIC units below the model without the effect and bootstraped confidence IC don't encompass 0 so there's some support although I agree with you that it's not unambiguously convincing given the residuals. We've made additional cross-validations by splitting the data in training and test data sets, and got the same results and reasonable predictive ability (at least the model is not biased).

The model only on the Vosges region is also improved when including the effect (quite weakly, ~7 AIC units) and the IC of the effect does not encompass 0. Again I agree that the residuals call for a lot of caution, but I'm failing to find any strong argument to reject it.

Well, thanks for your support and useful advice.

All the best

JYB

On 27/07/2016 15:30, Patrick Breheny wrote:

As a |visreg| issue, though, I don't see any clear evidence that the software is doing anything wrong -- as far as I can tell, the plots faithfully represent what is going on in the data and model.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pbreheny/visreg/issues/16#issuecomment-235585165, or mute the thread https://github.com/notifications/unsubscribe-auth/AS3TBX39vIcvbxJhTcQa7iemN_H-MNhMks5qZ11_gaJpZM4Iu_SM.

Jean-Yves Barnagaud Vertebrate Biogeography & Ecology group EPHE - CEFE - Montpellier, Fr. +0033(0)467613326 http://sites.google.com/site/jybarnagaud