rmcelreath / rethinking

Statistical Rethinking course and book package
2.16k stars 605 forks source link

R code 4.57: No CI plotted with shade() #318

Open binseb opened 3 years ago

binseb commented 3 years ago

Since I updated on the latest R Version 4.1.0, I am not able to plot some of the compatibility intervals with shade() anymore e. g. in R code 4.57 (see code below). While shade() does not plot the CI for the the linear regression example in chapter 4 (R code 4.57) or the poisson regression example in chapter 11 (R code 11.47) , it does though work for the quadratic regression example (R code 4.68).

I access R through the latest version of RStudio on Windows 10. When I run the same codes in R Version 4.0.3 on my work computer, everything works fine. Thank you for any clues you can provide on this.

`## R code 4.57

plot raw data

fading out points to make line and interval more visible

plot( height ~ weight , data=d2 , col=col.alpha(rangi2,0.5) )

plot the MAP line, aka the mean mu for each weight

lines( weight.seq , mu.mean )

plot a shaded region for 89% PI

shade( mu.PI , weight.seq )`

hwallace90 commented 3 years ago

I can confirm I'm having the same issue after updating to R 4.1

hwallace90 commented 3 years ago

I looked into this a bit more, and I figured it out. The default col for shade is col.alpha("black",0.15). This doesn't seem to work in R 4.1.0. Considering you really want the shades to have some transparency to them usually, I'm not sure what the solution is.

    if ( missing(lim) ) stop( "Interval limits missing." )
    if ( missing(object) ) stop( "No density or formula object." )
    from <- lim[1]
    to <- lim[2]
    if ( class(object)[1]=="formula" ) {
        # formula input
        x1 <- eval( object[[3]] )
        y1 <- eval( object[[2]] )
        x <- x1[ x1>=from & x1<=to ]
        y <- y1[ x1>=from & x1<=to ]
    }
    if ( class(object)[1]=="density" ) {
        # density input
        x <- object$x[ object$x>=from & object$x<=to ]
        y <- object$y[ object$x>=from & object$x<=to ]
    }
    if ( class(object)[1]=="matrix" & length(dim(object))==2 ) {
        # matrix defining confidence region around a curve
        y <- c( object[1,] , object[2,][ncol(object):1] ) # reverse second row
        x <- c( lim , lim[length(lim):1] ) # lim needs to be x-axis values
    }
    # draw
    if ( class(object)[1]=="matrix" ) {
        polygon( x , y , col=col , border=border , ... )
    } else {
        polygon( c( x , to , from ) , c( y , 0 , 0 ) , col=col , border=border , ... )
    }
    # label?
    if ( !is.null(label) ) {
        lx <- mean(x)
        ly <- max(y)/2
        text( lx , ly , label )
    }
}

Example

library(rethinking)
data(Kline)
d <- Kline
d

## R code 11.37
d$P <- scale( log(d$population) )
d$contact_id <- ifelse( d$contact=="high" , 2 , 1 )

## R code 11.45
dat <- list(
  T = d$total_tools ,
  P = d$P ,
  cid = d$contact_id )

# interaction model
m11.10 <- ulam(
  alist(
    T ~ dpois( lambda ),
    log(lambda) <- a[cid] + b[cid]*P,
    a[cid] ~ dnorm( 3 , 0.5 ),
    b[cid] ~ dnorm( 0 , 0.2 )
  ), data=dat , chains=4 , log_lik=TRUE )

## R code 11.47
k <- PSIS( m11.10 , pointwise=TRUE )$k
plot( dat$P , dat$T , xlab="log population (std)" , ylab="total tools" ,
      col=rangi2 , pch=ifelse( dat$cid==1 , 1 , 16 ) , lwd=2 ,
      ylim=c(0,75) , cex=1+normalize(k) )

# set up the horizontal axis values to compute predictions at
ns <- 100
P_seq <- seq( from=-1.4 , to=3 , length.out=ns )

# predictions for cid=1 (low contact)
lambda <- link( m11.10 , data=data.frame( P=P_seq , cid=1 ) )
lmu <- apply( lambda , 2 , mean )
lci <- apply( lambda , 2 , PI )
lines( P_seq , lmu , lty=2 , lwd=1.5 )
shade( lci , P_seq , xpd=TRUE )  #does nothing

shade(lci, P_seq, xpd=TRUE, col="black") #works correctly
rmcelreath commented 3 years ago

Thanks for this. I've put off installing 4.1.0 for many reasons, but I guess I should do so and try to deal with this.

rmcelreath commented 3 years ago

I cannot reproduce on Mac OS with R 4.1.0. Maybe this is Windows specific?

hwallace90 commented 3 years ago

Interesting, could be. I am on windows 10 as well like the OP. Forgot to mention, I checked and it's not a RStudio problem, got the same thing when I ran from standard R.

FroggyZ commented 3 years ago

Working on Windows 10 and R 4.1.0, I have exactly the same issue running codes like 4.57. Tried to figure out the origin and did some tests within "colors.r" notably on col.alpha() at https://github.com/rmcelreath/rethinking/blob/master/R/colors.r with no success until now.

So, until a fix is found, here is the trick I use:

Original 4.57 code (with none of its original comments): plot( height ~ weight , data=d2 , col=col.alpha(rangi2,0.5) ) lines( weight.seq , mu.mean ) shade( mu.PI , weight.seq ) Modified 4.57 code (changing the order of commands & adding "points" to overplot data points: plot( height ~ weight , data=d2, col=col.alpha(rangi2,0.5) ) shade( mu.PI , weight.seq, col="lightgrey") # you need to add a n R color to make it works :) points(d2$weight, d2$height, col=col.alpha(rangi2,0.5)) # overplot data points lines( weight.seq , mu.mean )

As another example, let's go to code 4.61:

Original 4.61 code (with none of its original comments): plot( height ~ weight , d2 , col=col.alpha(rangi2,0.5) ) lines( weight.seq , mu.mean ) shade( mu.HPDI , weight.seq ) shade( height.PI , weight.seq ) Modified 4.61 code (changing the order of commands & adding "points" to overplot data points: plot( height ~ weight , d2 , col=col.alpha(rangi2,0.5) ) shade( height.PI , weight.seq, col="lightgreen") shade( mu.PI , weight.seq, col="lightgrey" ) points(d2$weight, d2$height, col=col.alpha(rangi2,0.5)) lines( weight.seq , mu.mean )

Hope this will help a little :)

FroggyZ commented 3 years ago

:( But this trick does not work on example 4.68... However, the original code works... Really weird :):):)

eveskew commented 3 years ago

In case it adds any clarity, I had three students today (all on Windows machines) run into the same problem. On the other hand, I didn't run into any issues on my machine (R 4.1.1 and macOS Big Sur). They could get the plot and trend line at the end of this code chunk but not the shaded interval:

library(rethinking)

data("WaffleDivorce")
d <- WaffleDivorce

d$D <- standardize(d$Divorce)
d$M <- standardize(d$Marriage)
d$A <- standardize(d$MedianAgeMarriage)

m5.2 <- quap(
  data = d,
  alist(
    D ~ dnorm(mu, sigma),
    mu <- a + bM * M,
    a ~ dnorm(0, 0.2),
    bM ~ dnorm(0, 0.5),
    sigma ~ dexp(1)
  )
)

precis(m5.2, prob = 0.9)

M.seq <- seq(from = -3, to = 3, by = 1)

mu <- link(m5.2, data = list(M = M.seq))
mu.mean <- apply(mu, 2, mean)
mu.HPDI <- apply(mu, 2, HPDI, 0.9)

plot(D ~ M, data = d, pch = 19)
lines(M.seq, mu.mean)
shade(mu.HPDI, M.seq)

The workaround strategy suggested by @FroggyZ (plotting an interval of a solid color and then overlaying the points) does work for them, so thanks for that!

It is an extremely weird issue because I believe all three students successfully plotted transparent intervals last week in the context of a different problem. If you need reproducible examples of code that does and does not work on the Windows machines, I could try to provide those.

caadams commented 2 years ago

It appears to be a bug in R 4.1.X running on Windows. Polygons can't be plotted if they extend out of the plot area.

Here's an explanation: https://stackoverflow.com/questions/70101061/transparent-polygon-not-being-drawn-in-r-4-1-2-but-is-drawn-in-r-4-0-3

It is fixed in the development version of R

NLAndreas commented 1 year ago

I have the same issue with shade() not producing output. I am using R 4.2.2 and get the following error message when trying to run R code 4.57:

shade( mu.PI , weight.seq ) Error in if (class(object) == "formula") { : the condition has length > 1

Anyone who has found a solution to the problem? Tried using the option by FroggyZ above but got the same error.

arnavsheth commented 1 year ago

Having the same issue as above. Running R 4.3.1 on Mac using VSCode. None of the workarounds FroggyZ suggested worked for me... :(

Edit: To be clear, I am running the same code as the users above, and getting the same error:

shade( mu.PI , weight.seq )
Error in if (class(object) == "formula") { : the condition has length > 1
FroggyZ commented 1 year ago

Sounds really weird. It's like shade data types arguments are not the one expected... I don't see anything more than going to shade source code, check again that aspect + recopy code in another "source program" to be run appart ...

What is your R version, package version? did you try older ones?

Also look as mentioned above by caadams: https://stackoverflow.com/questions/70101061/transparent-polygon-not-being-drawn-in-r-4-1-2-but-is-drawn-in-r-4-0-3 or https://stackoverflow.com/questions/19245468/r-polygon-plot-not-shading-to-x-axis

Is all that only xlim/ylim issue?

arnavsheth commented 1 year ago

Thanks, FroggyZ. I'm running R version 4.3.1; rstan version 2.26.23; Stan version 2.26.1; rethinking version 2.01. I did not play around with it too much. For now, I just commented out the shading. I'll try the other links you posted to see if I can make it work. Thanks again.

Addendum: I tried adding xlim and ylim (together and individually) to the plot, including going from 0 to the max-es of d2$height and d2$weight. Same error. So not an xlim/ylim issue if I'm understanding that stack overflow post correctly.

Addendum 2: Also note that I am not on Windows, but on Mac (Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.5.1). I also poked around and saw that this is a known Windows issue at least for R 4.1. I also found this issue. Not sure if it's related?

FroggyZ commented 1 year ago

Thanks for trying arnavsheth.

arnavsheth commented 1 year ago

One more thing. I looked at the help page for shade and there was some example code there. I ran this:

models <- seq( from=0 , to=1 , length.out=100 )
prior <- rep( 1 , 100 )
likelihood <- dbinom( 6 , size=9 , prob=models )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
# using formula interface
plot( posterior ~ models , type="l" )
shade( posterior ~ models , c(0,0.5) )

And it worked fine. Then I ran this:

# using density interface
samples <- sample( models , size=1e4 , replace=TRUE , prob=posterior )
plot( density(samples) )
shade( density(samples) , PCI(samples) )

It also worked fine. Finally, this gave that error again:

# plotting a shaded confidence interval on a regression plot
data(cars)
m <- lm( dist ~ speed , cars )
p <- extract.samples( m )
x.seq <- seq( from=min(cars$speed)-1 , to=max(cars$speed)+1 , length.out=30 )
mu.ci <- sapply( x.seq , function(x) PI( p[,1] + p[,2]*x ) )
plot( dist ~ speed , cars )
abline( m )
shade( mu.ci , x.seq )

Anyway, apologies for the multiple messages. Just adding more info as I discover it. Thanks again!

arnavsheth commented 1 year ago

So just following up on this. I looked at the code for shade, and seemed to have discovered something. Let's take the following code example from the help page of shade:

# plotting a shaded confidence interval on a regression plot data(cars) 
m <- lm( dist ~ speed , cars ) 
p <- extract.samples( m ) 
x.seq <- seq( from=min(cars$speed)-1 , to=max(cars$speed)+1 , length.out=30 ) 
mu.ci <- sapply( x.seq , function(x) PI( p[,1] + p[,2]*x ) ) 
plot( dist ~ speed , cars ) 
abline( m ) 
shade( mu.ci , x.seq )

For this code, shade will take an object like mu.ci, and check to see if it is a formula, density, or matrix. Specifically, the first check in the shade code is :

if (class(object) == "formula") {
        x1 <- eval(object[[3]]) 
        y1 <- eval(object[[2]])  
        x <- x1[x1 >= from & x1 <= to]  
        y <- y1[x1 >= from & x1 <= to]  
    }

However, when you put an object like mu.ci into the class function, you get the following output:

> class(mu.ci)
[1] "matrix" "array" 

And this seems to be the problem. Rather than returning just one class, it returns the class as well as the inherited class which is "array". This results in the error:

Error in if (class(object) == "formula") { : the condition has length > 1

Does anyone know a way around this? @FroggyZ @rmcelreath Or am I way off? If so, apologies.

FroggyZ commented 1 year ago

Thanks for diving more into that @arnavsheth

Below (denoted as code #1) is the last shade() code (I think?) found in the repo (in order to see it better, I did not use the "code chunk" insertion). Note that shade() code is in "plotting.R" of "rethinking". As you can see, each time "class(object)" is tested, it is using "[1]": class(object)[1]=="formula" class(object)[1]=="density" class(object)[1]=="matrix" And in the last case, it will select only the 1st item = "matrix".

So maybe the best is to try shade() version that has: class(object)[1] (even if this should not create an error...)

Now on my computer (R 4.2.3 2023-03-15 ucrt, Rstudio 2023.09.0 Build 463, rstan 2.32.3, rethinking 2.40), running the code you indicated above (reproduced below and denoted as code #2) does not give any error (today?). However, I have just done a full STAN reinstall + a full rethinking reinstall with its dependencies (don't forget "cmdstanr" as it is also required to load rethinking but apparently not in dependencies).

You can either test the code below or reinstall "rethinking" fully and try again (below are the lines I used to reinstall "rethinking"):

--- 1st install "cmdstanr" (it will be loaded whan libraryu(rethinking) will be run)

install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

--- 2nd install all other packages needed + "rethinking" package from github

install.packages(c("coda","mvtnorm","devtools","loo","dagitty","shape")) devtools::install_github("rmcelreath/rethinking", dependencies = TRUE)

Please let me know what happen :)

Code #1: shade() code from "plotting.R" (copied today )

shade <- function( object , lim , label=NULL , col=col.alpha("black",0.15) , border=NA , ... ) { if ( missing(lim) ) stop( "Interval limits missing." ) if ( missing(object) ) stop( "No density or formula object." ) from <- lim[1] to <- lim[2] if ( class(object)[1]=="formula" ) {

formula input

      x1 <- eval( object[[3]] )
      y1 <- eval( object[[2]] )
      x <- x1[ x1>=from & x1<=to ]
      y <- y1[ x1>=from & x1<=to ]
 }
 if ( **class(object)[1]**=="density" ) {
      # density input
      x <- object$x[ object$x>=from & object$x<=to ]
      y <- object$y[ object$x>=from & object$x<=to ]
 }
 if ( **class(object)[1]**=="matrix" & length(dim(object))==2 ) {
      # matrix defining confidence region around a curve
      y <- c( object[1,] , object[2,][ncol(object):1] ) # reverse second row
      x <- c( lim , lim[length(lim):1] ) # lim needs to be x-axis values
 }
 # draw
 if ( **class(object)[1]**=="matrix" ) {
      polygon( x , y , col=col , border=border , ... )
 } else {
      polygon( c( x , to , from ) , c( y , 0 , 0 ) , col=col , border=border , ... )
 }
 # label?
 if ( !is.null(label) ) {
      lx <- mean(x)
      ly <- max(y)/2
      text( lx , ly , label )
 }

}

Code #2: plotting a shaded confidence interval on a regression plot

data(cars) m <- lm( dist ~ speed , cars ) p <- extract.samples( m ) x.seq <- seq( from=min(cars$speed)-1 , to=max(cars$speed)+1 , length.out=30 ) mu.ci <- sapply( x.seq , function(x) PI( p[,1] + p[,2]*x ) ) plot( dist ~ speed , cars ) abline( m ) shade( mu.ci , x.seq )

arnavsheth commented 1 year ago

Fantastic. Thank you so much @FroggyZ! I will follow up once I have a chance to try this out. Thanks again!

arnavsheth commented 1 year ago

Update: @FroggyZ - this is resolved. Thank you again!

FroggyZ commented 1 year ago

Glad to hear that @arnavsheth, even though I remain highly perplexed... Could you just briefly explain what you tried (among the suggestions)?

arnavsheth commented 1 year ago

Ah. Sorry if I was unclear. I simply reinstalled rethinking and Stan, following your instructions. My rethinking was at v2.01. I had installed it following the instructions in the text book, about a week or so ago. I noticed that yours was at 2.40. I think that this did the trick.

Of course I also reinstalled everything else as well, per your instructions. The dependencies are not entirely clear to me, but I now have rethinking 2.40, rstan 2.26.23 (behind your version), and R 4.3.1 2023-06-16.

Does that help in answering your question?

JasonMcLachlan commented 1 month ago

Hi everyone. I'm posting a year after the last comment, as the error with shade() still exists. I'm using the latest RStudio (2024.04.4), rstan (2.26.3) and rethinking (2.01), so outdated software isn't the problem. The students in my class got the same error, so it might be general problem.

The error we get running this line from R code 4.57 shade(mu.PI , weight.seq ) is the same one that we get when we run the example code plotting a shaded CI in help(shade): shade( mu.ci , x.seq )

Error in if (class(object) == "formula") { : the condition has length > 1

I could be wrong, but this looks to me like shade() is mistaking a matrix (mu.ci) for a formula, and thus throwing an error.

For my class, I was able to hack an alternative solution using polygon(). In case that's useful for others, code like that below works in place of shade():

x <- c(x.seq, rev(x.seq)
y <- c(mu.ci[1,], rev(mu.ci[2,])
transparentish_grey<-rgb(190, 190, 190, max=255, alpha = 150)
polygon(x, y, col = transparentish_grey, lty = 0)`
JasonMcLachlan commented 1 month ago

... also wondering whether this might be a Windows-only problem. A friend with a Mac is able to run shade() without getting an error.