kenkellner / ubms

Fit models to data from unmarked animals using Stan. Uses a similar interface to the R package 'unmarked', while providing the advantages of Bayesian inference and allowing estimation of random effects.
https://kenkellner.github.io/ubms/
GNU General Public License v3.0
35 stars 8 forks source link

bayes R2 for occupancy models #58

Open JustinCally opened 2 years ago

JustinCally commented 2 years ago

Hi Ken,

I'm not sure whether the use of a bayes R2 would be broadly suitable for occupancy models given the limitations with its application in logistic regression and often low values. However I had a crack at getting a bayes R2 for the state, detection submodels (and their combined value) using the residual method (https://avehtari.github.io/bayes_R2/bayes_R2.html). A bit of a messy function and example below.

# R2 example for logistic occupancy model
    library(ubms)
#> Loading required package: unmarked
#> Loading required package: lattice
#> 
#> Attaching package: 'ubms'
#> The following objects are masked from 'package:unmarked':
#> 
#>     fitList, projected
#> The following object is masked from 'package:base':
#> 
#>     gamma

    # R2 function
    bayes_R2_res_ubms <- function(fit, re.form = NULL, draws = draws) {

        # Get the observed occupancy at each site for each observation period
        y <- ubms::getY(fit)

        # Get the observed occupancy at each site
        y_mod <- matrix(apply(y, 1, FUN = function(x) max(x, na.rm = TRUE), simplify = TRUE), ncol = 1)

        # Get the linear predictor for the 'state'
        ypred <- ubms::posterior_linpred(fit, transform = TRUE, submodel = "state", re.form = re.form, draws = draws)
        yp <- ubms::posterior_linpred(fit, transform = TRUE, submodel = "det", re.form = re.form, draws = draws)

        # Assume perfect detection (no effect of detection submodel)
        yp1 <- yp
        yp1[!is.na(yp1)] <- 1

        # For each draw obtain the probability for each site that a detection will occur (0-1).
        # Probably a less messy way to do this
        ys <- list("state" = yp,
                   "detection" = yp1)

        ypred_mod <- list()
        r2 <- list()

        for(j in 1:length(ys)) {

            # detection * state
            ypred_prob <- ys[[j]]

            # draws x sites x obs
            ypred_cond <- array(data = NA, dim = c(draws, dim(ypred)[2], fit@response@max_obs))
            ypred_mod[[j]] <- ypred

            # loop over draws
            for(i in 1:draws) {
                # repeat the latent state by observation periods
                ypred_vec <- rep(ypred[i,], each = fit@response@max_obs)
                # det * state
                ypred_prob[i,] <- ypred_prob[i,]*ypred_vec
                # force into matrix with nsites X nobs
                ypred_cond[i,,] <- matrix(ypred_prob[i,], ncol = fit@response@max_obs, byrow = TRUE)
                # 1 minus the probability that all observations are zero = at least 1 detection
                ypred_mod[[j]][i,] <- 1-apply(1-ypred_cond[i,,], 1, function(x) {
                    prod(x, na.rm = TRUE)
                })

            }

            if (fit@response@z_dist == "binomial" && NCOL(y_mod) == 2) {
                trials <- rowSums(y_mod)
                y_mod <- y_mod[, 1]
                ypred_mod[[j]] <- ypred_mod[[j]] %*% diag(trials)
            }
            e <- -1 * sweep(ypred_mod[[j]], 2, y_mod)
            var_ypred <- apply(ypred_mod[[j]], 1, var)
            var_e <- apply(e, 1, var)
            r2[[j]] <- var_ypred / (var_ypred + var_e)
        }
        r2[[3]] <- r2[[1]] - r2[[2]]
        names(r2) <- c("both", "state", "det")
        return(r2)
    }

    # Data simulation

    set.seed(123)
    dat_occ <- data.frame(x1=rnorm(500))
    dat_p <- data.frame(x2=rnorm(500*5))
    y <- matrix(NA, 500, 5)
    z <- rep(NA, 500)
    b <- c(0.4, -0.5, 0.3, 0.5)
    re_fac <- factor(sample(letters[1:26], 500, replace=T))
    dat_occ$group <- re_fac
    re <- rnorm(26, 0, 1.2)
    re_idx <- as.numeric(re_fac)
    idx <- 1
    for (i in 1:500){
        z[i] <- rbinom(1,1, plogis(b[1] + b[2]*dat_occ$x1[i] + re[re_idx[i]]))
        for (j in 1:5){
            y[i,j] <- z[i]*rbinom(1,1,
                                  plogis(b[3] + b[4]*dat_p$x2[idx]))
            idx <- idx + 1
        }
    }

    # unmarked frame
    umf <- unmarkedFrameOccu(y=y, siteCovs=dat_occ, obsCovs=dat_p)

    # model
    options(mc.cores=3) #number of parallel cores to use
    fm <- stan_occu(~x2 ~x1 + (1|group), umf, chains=3)

    bayes_R2_res_ubms(fm, draws = 100)
#> $both
#>   [1] 0.2436708 0.1813600 0.1861829 0.1968885 0.2093023 0.2065249 0.2262449
#>   [8] 0.1605566 0.2356042 0.2089506 0.2313770 0.2569806 0.2476316 0.2080628
#>  [15] 0.1884270 0.2458283 0.2353730 0.2410739 0.2087004 0.2352926 0.2182499
#>  [22] 0.2195304 0.2197247 0.2299345 0.1642092 0.2512100 0.2537433 0.1866575
#>  [29] 0.2420921 0.1616609 0.1850477 0.2040931 0.2660747 0.1736265 0.2113426
#>  [36] 0.2345318 0.2077254 0.1940378 0.2166815 0.2828791 0.2590453 0.1380645
#>  [43] 0.2494774 0.2094812 0.1596296 0.2125061 0.2406926 0.2306142 0.2238410
#>  [50] 0.1424783 0.2422054 0.2399446 0.2433104 0.2083924 0.2163885 0.2143759
#>  [57] 0.2868863 0.2545129 0.2662583 0.2061908 0.2272771 0.1880253 0.2036795
#>  [64] 0.2259326 0.2396712 0.1560981 0.2029403 0.2575136 0.2447248 0.2600542
#>  [71] 0.1503401 0.1612022 0.2184497 0.2333725 0.2235703 0.2745752 0.1902804
#>  [78] 0.2021373 0.1704586 0.2658361 0.2470988 0.2298406 0.2281244 0.2116103
#>  [85] 0.2407505 0.2564699 0.2642025 0.2530184 0.2964650 0.1884810 0.1804507
#>  [92] 0.2371928 0.2304418 0.2121007 0.2515888 0.2200955 0.1956262 0.2275631
#>  [99] 0.2210517 0.2139585
#> 
#> $state
#>   [1] 0.17605019 0.10263418 0.10580724 0.13545364 0.14482887 0.13800919
#>   [7] 0.17355460 0.08628623 0.17066872 0.14583832 0.16249959 0.20375155
#>  [13] 0.18365678 0.13735974 0.11535767 0.17160635 0.16365889 0.18999839
#>  [19] 0.16342425 0.17257228 0.16156030 0.15115874 0.15180402 0.17367579
#>  [25] 0.09177282 0.18924085 0.20799994 0.13745090 0.18211136 0.09588723
#>  [31] 0.11718993 0.14892238 0.22422619 0.13182738 0.15381415 0.17530606
#>  [37] 0.14920495 0.12388629 0.15416240 0.24336614 0.20041741 0.08053604
#>  [43] 0.20827089 0.14264055 0.09731327 0.14357953 0.18287715 0.16933439
#>  [49] 0.15418602 0.07828137 0.17981013 0.17819720 0.19682582 0.15552595
#>  [55] 0.15226781 0.15547727 0.24004102 0.20505126 0.22142930 0.14618668
#>  [61] 0.15626259 0.12546194 0.14349652 0.16704033 0.18849389 0.08944449
#>  [67] 0.12046083 0.22172548 0.19268544 0.20941069 0.07355600 0.11754425
#>  [73] 0.15150645 0.17893636 0.16290501 0.22485436 0.12711429 0.15046677
#>  [79] 0.11693493 0.22099586 0.18770485 0.17481347 0.16442692 0.14398650
#>  [85] 0.17886823 0.20548550 0.20694618 0.20829883 0.24947301 0.11531789
#>  [91] 0.10101353 0.18807278 0.18736563 0.14776186 0.19111258 0.16002263
#>  [97] 0.11816607 0.17083967 0.16663188 0.15840369
#> 
#> $det
#>   [1] 0.06762062 0.07872586 0.08037562 0.06143486 0.06447340 0.06851572
#>   [7] 0.05269027 0.07427036 0.06493549 0.06311229 0.06887738 0.05322905
#>  [13] 0.06397480 0.07070302 0.07306933 0.07422198 0.07171414 0.05107554
#>  [19] 0.04527614 0.06272035 0.05668959 0.06837161 0.06792069 0.05625875
#>  [25] 0.07243638 0.06196911 0.04574332 0.04920663 0.05998069 0.06577362
#>  [31] 0.06785781 0.05517070 0.04184849 0.04179915 0.05752846 0.05922575
#>  [37] 0.05852047 0.07015146 0.06251909 0.03951300 0.05862791 0.05752848
#>  [43] 0.04120654 0.06684066 0.06231634 0.06892656 0.05781542 0.06127982
#>  [49] 0.06965496 0.06419698 0.06239527 0.06174742 0.04648460 0.05286642
#>  [55] 0.06412072 0.05889858 0.04684531 0.04946166 0.04482896 0.06000413
#>  [61] 0.07101450 0.06256336 0.06018296 0.05889223 0.05117732 0.06665366
#>  [67] 0.08247942 0.03578808 0.05203937 0.05064353 0.07678410 0.04365799
#>  [73] 0.06694323 0.05443615 0.06066532 0.04972085 0.06316610 0.05167058
#>  [79] 0.05352365 0.04484027 0.05939398 0.05502711 0.06369745 0.06762376
#>  [85] 0.06188222 0.05098439 0.05725632 0.04471958 0.04699198 0.07316309
#>  [91] 0.07943715 0.04912006 0.04307613 0.06433879 0.06047627 0.06007289
#>  [97] 0.07746012 0.05672347 0.05441978 0.05555479

Created on 2021-11-11 by the reprex package (v2.0.1)

kenkellner commented 2 years ago

Hi Justin,

Thanks, this is really cool! I have to admit I don't know that much about calculating a Bayes R2 for this type of model so I will have to look into it (thanks for including the link). It does seem like something that would be useful to add. Are you aware of any papers that calculate a Bayes r2 for occupancy models?

Ken

JustinCally commented 2 years ago

Hi Ken,

I'm unaware of any paper's calculating Bayes R2 for occupancy models. Hence I'm a bit unsure whether the function written above is the preferred approach. In comparison, the method used in unmarked::modSel() that calculates R2 based on the likelihoods of the null model and fitted model (Nagelkerke 1991). I wonder whether it is possible to adapt this method in a Bayes context...

I suppose one good thing about the function above is that it provides the bayes R2 for both detection and state submodels, which may be beneficial in understanding the value of using an occupancy model for a given situation.