saudiwin / ordbetareg_pack

Repository for R package ordbetareg, used to fit continuous data with lower and upper bounds.
Other
17 stars 3 forks source link

Error using LOO with: pointwise = TRUE #11

Closed Foztarz closed 1 year ago

Foztarz commented 1 year ago

Thanks for this incredibly useful package! I'm running into issues trying to use loo() for model comparison. I have very large models (fitted on imputed data using brm_multiple()) so I'm trying to use pointwise estimates for LOO to avoid reaching maximum memory usage. loo(..., pointwise = FALSE) runs fine on (smaller) ordbetareg models, but with pointwise = TRUE I get the following error: Error in draws$dpars$mu[, i] It looks to me like the draws are not in the format that log_lik_ord_beta_reg() expects them to be in. Depending on how loo() extracts them, the specific draws for mu are probably stored in draws$dpars$mu$fe. I had a look at this in the example below (I'm also interested in effects on phi, which I think may have a similar problem). I really have no idea about the inner workings of LOO though.

    require(ordbetareg)
    # Test data ---------------------------------------------------------------
    set.seed(20230202)
    #data where both mu & phi change as a function of x
    dd <- data.frame(yy = round(plogis(
                            rnorm(n = 101, 
                                  mean = -50:50 / 5, 
                                  sd = seq(from = 5, 
                                           to = 0.1, 
                                           length.out = 101)
                                  )), 2),
                    xx = -50:50 / 5)
    # Fit ordered beta model --------------------------------------------------
    ordb_model<- ordbetareg( formula =
                  bf(yy ~ xx, phi ~ xx),
                  phi_reg = TRUE,
                  data = dd,
                  cores = parallel::detectCores()-1,
                  chains =  parallel::detectCores()-1,
                  iter = 1e3,
                  init = '0',
                  backend = 'cmdstanr'
                )

    # Inspect model predictions ----------------------------------------------
    loo_heavy <- loo(ordb_model)
    ## Computed from 3500 by 4901 log-likelihood matrix
    ## 
    ## Estimate     SE
    ## elpd_loo  -9065.8  524.2
    ## p_loo      8283.9  482.7
    ## looic     18131.7 1048.5
    ## ------
    ##   Monte Carlo SE of elpd_loo is NA.
    ## 
    ## Pareto k diagnostic values:
    ##   Count Pct.    Min. n_eff
    ## (-Inf, 0.5]   (good)     4047  82.6%   209       
    ## (0.5, 0.7]   (ok)        198   4.0%   57        
    ## (0.7, 1]   (bad)       183   3.7%   14        
    ## (1, Inf)   (very bad)  473   9.7%   1                

    loo_point <- loo(ordb_model, pointwise = TRUE)
    ##Error in draws$dpars$mu[, i] : incorrect number of dimensions
    loo_sub_point <- loo_subsample(ordb_model)
    ##Error in draws$dpars$mu[, i] : incorrect number of dimensions

    #indeed the draws do not have this shape
    sub_draws <- prepare_predictions(ordb_model, point_estimate = 'median')
    summary(sub_draws$dpars$mu)
    ## Length Class        Mode   
    ## family 18     customfamily list   
    ## ndraws  1     -none-       numeric
    ## nobs    1     -none-       numeric
    ## fe      2     -none-       list   
    ## sp      0     -none-       list   
    ## cs      0     -none-       list   
    ## sm      0     -none-       list   
    ## gp      0     -none-       list   
    ## re      0     -none-       list   
    ## ac      0     -none-       list  

    #estimates are stored in $fe$b
    head(sub_draws$dpars$mu$fe$b)

    log_lik_ord_beta_reg <-
      function(i, draws) {

        # mu <- draws$dpars$mu[,i]
        mu <- with( draws$dpars$mu$fe, b%*%X[i,]  ) #something like this to give estimate for that datapoint
        phi <- with( draws$dpars$phi$fe, b%*%X[i,] ) #something like this to give estimate for that datapoint
        mu <- plogis(mu) # expected on the response scale?
        phi <- plogis(phi) # expected on the response scale?
        y <- draws$data$Y[i]
        cutzero <- draws$dpars$cutzero
        cutone <- draws$dpars$cutone

        thresh1 <- cutzero
        thresh2 <- cutzero + exp(cutone)

        if(y==0) {
          out <- log(1 - plogis(qlogis(mu) - thresh1))
        } else if(y==1) {
          out <- log(plogis(qlogis(mu) - thresh2))
        } else {
          out <- log(plogis(qlogis(mu)-thresh1) - plogis(qlogis(mu) - thresh2)) + dbeta(y,mu*phi,(1-mu)*phi,log=T)
        }

        out

      }

    sum(log_lik_ord_beta_reg(20,sub_draws))
    ##-1.188826
saudiwin commented 1 year ago

ok cool I'll take a look at this. Admittedly I never tested this particular feature.

From a methodological perspective, in the meantime best thing would be to do full loo on random samples of your data (will likely have a similar outcome to the pointwise option).

Foztarz commented 1 year ago

Brilliant, thanks! Yes in the meantime I'll give that a try, might actually make more sense that using the whole imputed dataset anyway. On a possibly-related note posterior_predict does seem to use draws with the structure expected by log_lik_ord_beta_reg. Not sure it's even worth flagging as a separate issue, as the error Error in sample.int(length(x), size, replace, prob) : NA in probability vector I was getting there could be fixed just by treating phi the same way as mu.

plt = ordbetareg::pp_check_ordbeta(ordb_model, ndraws = 100)
## Error in sample.int(length(x), size, replace, prob) :
##   NA in probability vector

#overwrite posterior_predict to account for variable phi
posterior_predict_ord_beta_reg <- function(i, draws, ...) {
  mu <- draws$dpars$mu[, i]
  # phi <- draws$dpars$phi
  phi <- draws$dpars$phi[, i] #iterate through phi as well
  cutzero <- draws$dpars$cutzero
  cutone <- draws$dpars$cutone
  N <- length(phi)

  thresh1 <- cutzero
  thresh2 <- cutzero + exp(cutone)

  pr_y0 <- 1 - plogis(qlogis(mu) - thresh1)
  pr_y1 <- plogis(qlogis(mu) - thresh2)
  pr_cont <- plogis(qlogis(mu)-thresh1) - plogis(qlogis(mu) - thresh2)
  out_beta <- rbeta(n=N,mu*phi,(1-mu)*phi)

  # now determine which one we get for each observation
  outcomes <- sapply(1:N, function(i) {
    sample(1:3,size=1,prob=c(pr_y0[i],pr_cont[i],pr_y1[i]))
  })

  final_out <- sapply(1:length(outcomes),function(i) {
    if(outcomes[i]==1) {
      return(0)
    } else if(outcomes[i]==2) {
      return(out_beta[i])
    } else {
      return(1)
    }
  })

  final_out

}
#now it runs to completion
plt = ordbetareg::pp_check_ordbeta(ordb_model, ndraws = 100)
saudiwin commented 1 year ago

Cool, I'll take a look.

On Fri, Feb 3, 2023 at 3:09 PM Foztarz @.***> wrote:

Brilliant, thanks! Yes in the meantime I'll give that a try, might actually make more sense that using the whole imputed dataset anyway. On a possibly-related note posterior_predict does seem to use draws with the structure expected by log_lik_ord_beta_reg. Not sure it's even worth flagging as a separate issue, as the error Error in sample.int(length(x), size, replace, prob) : NA in probability vector I was getting there could be fixed just by treating phi the same way as mu.

plt = ordbetareg::pp_check_ordbeta(ordb_model, ndraws = 100)## Error in sample.int(length(x), size, replace, prob) :## NA in probability vector

overwrite posterior_predict to account for variable phiposterior_predict_ord_beta_reg <- function(i, draws, ...) {

mu <- draws$dpars$mu[, i]

phi <- draws$dpars$phi

phi <- draws$dpars$phi[, i] #iterate through phi as well cutzero <- draws$dpars$cutzero cutone <- draws$dpars$cutone N <- length(phi)

thresh1 <- cutzero thresh2 <- cutzero + exp(cutone)

pr_y0 <- 1 - plogis(qlogis(mu) - thresh1) pr_y1 <- plogis(qlogis(mu) - thresh2) pr_cont <- plogis(qlogis(mu)-thresh1) - plogis(qlogis(mu) - thresh2) out_beta <- rbeta(n=N,muphi,(1-mu)phi)

now determine which one we get for each observation

outcomes <- sapply(1:N, function(i) { sample(1:3,size=1,prob=c(pr_y0[i],pr_cont[i],pr_y1[i])) })

final_out <- sapply(1:length(outcomes),function(i) { if(outcomes[i]==1) { return(0) } else if(outcomes[i]==2) { return(out_beta[i]) } else { return(1) } })

final_out

}#now it runs to completionplt = ordbetareg::pp_check_ordbeta(ordb_model, ndraws = 100)

— Reply to this email directly, view it on GitHub https://github.com/saudiwin/ordbetareg_pack/issues/11#issuecomment-1415720801, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVOLC2XR3SHE4NAJ4PJDWVTRNHANCNFSM6AAAAAAUPJEGA4 . You are receiving this because you commented.Message ID: @.***>

saudiwin commented 1 year ago

Just pushed code that fixes this issue. The following code runs fine:

require(ordbetareg)
# Test data ---------------------------------------------------------------
set.seed(20230202)
#data where both mu & phi change as a function of x
dd <- data.frame(yy = round(plogis(
  rnorm(n = 101, 
        mean = -50:50 / 5, 
        sd = seq(from = 5, 
                 to = 0.1, 
                 length.out = 101)
  )), 2),
  xx = -50:50 / 5)
# Fit ordered beta model --------------------------------------------------
ordb_model<- ordbetareg( formula =
                           bf(yy ~ xx, phi ~ xx),
                         phi_reg = TRUE,
                         data = dd,
                         cores = parallel::detectCores()-1,
                         chains =  parallel::detectCores()-1,
                         iter = 1e3,
                         init = '0',
                         backend = 'cmdstanr'
)

# Inspect model predictions ----------------------------------------------
loo_heavy <- loo(ordb_model)
## Computed from 3500 by 4901 log-likelihood matrix
## 
## Estimate     SE
## elpd_loo  -9065.8  524.2
## p_loo      8283.9  482.7
## looic     18131.7 1048.5
## ------
##   Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##   Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     4047  82.6%   209       
## (0.5, 0.7]   (ok)        198   4.0%   57        
## (0.7, 1]   (bad)       183   3.7%   14        
## (1, Inf)   (very bad)  473   9.7%   1                

loo_point <- loo(ordb_model, pointwise = TRUE)
##Error in draws$dpars$mu[, i] : incorrect number of dimensions
loo_sub_point <- loo_subsample(ordb_model,observations=9)
##Error in draws$dpars$mu[, i] : incorrect number of dimensions

#indeed the draws do not have this shape
sub_draws <- prepare_predictions(ordb_model, point_estimate = 'median')
summary(sub_draws$dpars$mu)
Foztarz commented 1 year ago

Wow, that was fast, thanks! I'll update my version to the latest branch and try it out.

saudiwin commented 1 year ago

Well, it was good timing as I am working on a new release. Happy to help!

Foztarz commented 1 year ago

I can confirm that this now allows loo(..., pointwise = TRUE) for me, which is really great for model comparison. As might be expected, a few other things throw errors after the changes to posterior_epred_ord_beta_reg and log_lik_ord_beta_reg. Shall I report these here as I come across them or would it be better to open a new thread?

saudiwin commented 1 year ago

Yes, I'm doing additional bug fixes right now and will submit a new release in a couple days, so probably wait for that for something a bit more stable.

On Thu, Feb 9, 2023 at 7:39 PM Foztarz @.***> wrote:

I can confirm that this now allows loo(..., pointwise = TRUE) for me, which is really great for model comparison. As might be expected, a few other things throw errors after the changes to posterior_epred_ord_beta_reg and log_lik_ord_beta_reg. Shall I report these here as I come across them or would it be better to open a new thread?

— Reply to this email directly, view it on GitHub https://github.com/saudiwin/ordbetareg_pack/issues/11#issuecomment-1424395167, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVOOJY66MKZ7PJG4RWZDWWUFTLANCNFSM6AAAAAAUPJEGA4 . You are receiving this because you modified the open/close state.Message ID: @.***>

Foztarz commented 1 year ago

OK great

saudiwin commented 1 year ago

FYI, new version up on CRAN/github.

On Fri, Feb 10, 2023 at 12:02 AM Foztarz @.***> wrote:

OK great

— Reply to this email directly, view it on GitHub https://github.com/saudiwin/ordbetareg_pack/issues/11#issuecomment-1424739398, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVOOYETS6NQLZXUJ3P53WWVEOBANCNFSM6AAAAAAUPJEGA4 . You are receiving this because you modified the open/close state.Message ID: @.***>

Foztarz commented 1 year ago

Brilliant, thanks. I can confirm that, using the CRAN version, I now get the expected behaviour with loo(..., pointwise = TRUE), log_lik(), and posterior_epred() for models with effects on phi.

saudiwin commented 1 year ago

Sounds good!

On Wed, Feb 15, 2023 at 3:42 PM Foztarz @.***> wrote:

Brilliant, thanks. I can confirm that, using the CRAN version, I now get the expected behaviour with loo(..., pointwise = TRUE), log_lik(), and posterior_epred() for models with effects on phi.

— Reply to this email directly, view it on GitHub https://github.com/saudiwin/ordbetareg_pack/issues/11#issuecomment-1431236172, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTVVOPIBDTV7T75YMDPG6LWXS6KJANCNFSM6AAAAAAUPJEGA4 . You are receiving this because you modified the open/close state.Message ID: @.***>