stan-dev / posterior

The posterior R package
https://mc-stan.org/posterior/
Other
167 stars 24 forks source link

Don't drop weights when subsetting variables? #298

Open fweber144 opened 1 year ago

fweber144 commented 1 year ago

I noticed that subsetting the variables of a draws object will drop weights:

# Based on the "Examples" section in `?posterior::weight_draws`:
suppressPackageStartupMessages(library(posterior))
x <- as_draws_matrix(example_draws())
wts <- rexp(ndraws(x))
( x <- weight_draws(x, weights = wts) )
#> Loading required namespace: testthat
#> # A draws_matrix: 100 iterations, 4 chains, and 10 variables
#>     variable
#> draw   mu tau theta[1] theta[2] theta[3] theta[4] theta[5] theta[6]
#>   1  2.01 2.8     3.96    0.271    -0.74      2.1    0.923      1.7
#>   2  1.46 7.0     0.12   -0.069     0.95      7.3   -0.062     11.3
#>   3  5.81 9.7    21.25   14.931     1.83      1.4    0.531      7.2
#>   4  6.85 4.8    14.70    8.586     2.67      4.4    4.758      8.1
#>   5  1.81 2.8     5.96    1.156     3.11      2.0    0.769      4.7
#>   6  3.84 4.1     5.76    9.909    -1.00      5.3    5.889     -1.7
#>   7  5.47 4.0     4.03    4.151    10.15      6.6    3.741     -2.2
#>   8  1.20 1.5    -0.28    1.846     0.47      4.3    1.467      3.3
#>   9  0.15 3.9     1.81    0.661     0.86      4.5   -1.025      1.1
#>   10 7.17 1.8     6.08    8.102     7.68      5.6    7.106      8.5
#> # ... with 390 more draws, and 2 more variables
#> # ... hidden reserved variables {'.log_weight'}

( x_sub <- x[, 1:2] )
#> # A draws_matrix: 100 iterations, 4 chains, and 2 variables
#>     variable
#> draw   mu tau
#>   1  2.01 2.8
#>   2  1.46 7.0
#>   3  5.81 9.7
#>   4  6.85 4.8
#>   5  1.81 2.8
#>   6  3.84 4.1
#>   7  5.47 4.0
#>   8  1.20 1.5
#>   9  0.15 3.9
#>   10 7.17 1.8
#> # ... with 390 more draws

Created on 2023-08-15 with reprex v2.0.2

As you can see, x_sub lacks the hidden reserved variable .log_weight.

I'm not sure if this is intended behavior or not, but I think subsetting variables (variables only, not the draws or chains) should keep weights.

fweber144 commented 1 year ago

I forgot to mention that other hidden reserved variables (other than .log_weight) might be affected as well, so perhaps this needs to be resolved by a more general approach which ensures that subsetting columns keeps any hidden reserved variables.

paul-buerkner commented 1 year ago

I think, we need to make an explicit design decision on whether we want to retain hidden variables or not when subsetting via []. I can imagine it can become very annoying for users if they try to get rid of all variables except for a view via [] but the hidden variables refuse to leave. @mjskay and @jgabry what do you think?

That said, you can always use subset_draws(., variable = ...) to subset while retaining hidden variables.

mjskay commented 1 year ago

Hmmm, that's a tough one. It does seem like the default behavior of [ should be to keep reserved variables (or at least .log_weight --- which is currently equivalent since it's the only reserved variable for that format?) to avoid silent errors caused by unexpected dropping of weights. Maybe [ should gain an argument indicating whether reserved variables are subject to the subset (defaulting to not dropping them)? (Perhaps subset() should too?)

paul-buerkner commented 1 year ago

I agree. So both [] and subset(_draws) should keep reserved variables by default but both gain an argument to turn this off? Sounds like a good approach to me.

jgabry commented 1 year ago

I agree. So both [] and subset(_draws) should keep reserved variables by default but both gain an argument to turn this off? Sounds like a good approach to me.

Sounds good to me too.

paul-buerkner commented 1 year ago

One challenge towards this will be that we cannot just use NextMethod for [] in some cases. For example, in the corresponding .draws_matrix method I see:

`[.draws_matrix` <- function(x, i, j, ..., drop = FALSE) {
  # TODO: allow for argument 'reserved' as in '[.draws_df'
  #   right now this fails because NextMethod() cannot ignore arguments
  out <- NextMethod("[", drop = drop)
  if (length(dim(out)) == length(dim(x))) {
    class(out) <- class(x)
    .nchains <- nchains(x)
    if (missing(i)) {
      attr(out, "nchains") <- .nchains
    } else if (.nchains > 1L) {
      warn_merge_chains("index")
    }
  }
  out
}

Just leaving this here for furture reference because I will surely forget otherwise.