Closed jfsalzmann closed 2 months ago
Reserved variables are not really used by {tidybayes}, as {posterior} came later and its notion of reserved variables is a bit different from what {tidybayes} is doing with .chain
/ .iteration
/ .draw
. {tidybayes} is just using those variables for the purposes of indexing.
If the draws are already uniquely identified by the combination of .chain
, .iteration
, and .draw
, you don't need .warmup
to be treated as an index variable in {tidybayes}, you can just treat it as any other variable by including it in the spread_draws
/ gather_draws
spec; for example:
library(posterior)
library(tidybayes)
example_draws() |>
as_draws_df() |>
dplyr::mutate(.warmup = rep(c(TRUE, FALSE), each = 200)) |>
spread_draws(theta[i], .warmup)
#> # A tibble: 3,200 × 6
#> # Groups: i [8]
#> i theta .chain .iteration .draw .warmup
#> <int> <dbl> <int> <int> <int> <lgl>
#> 1 1 3.96 1 1 1 TRUE
#> 2 1 0.124 1 2 2 TRUE
#> 3 1 21.3 1 3 3 TRUE
#> 4 1 14.7 1 4 4 TRUE
#> 5 1 5.96 1 5 5 TRUE
#> 6 1 5.76 1 6 6 TRUE
#> 7 1 4.03 1 7 7 TRUE
#> 8 1 -0.278 1 8 8 TRUE
#> 9 1 1.81 1 9 9 TRUE
#> 10 1 6.08 1 10 10 TRUE
#> # ℹ 3,190 more rows
Created on 2024-04-22 with reprex v2.1.0
Now, if you do need .warmup
to uniquely identify draws, that's another story. Probably these functions should have a draw_indices
argument like unspread_draws
and ungather_draws
do.
Thanks @mjskay. So indeed, in my case .warmup
would not be required to uniquely identify draws, however I mostly use gather_draws
and there, when including .warmup
just like any other variable, it will end up appearing in the .variable
column which puts the problem straight, I believe.
posterior %>% gather_draws(mu_y,.warmup) %>% arrange(.variable)
# A tibble: 12,000 × 5
# Groups: .variable [2]
.chain .iteration .draw .variable .value
<int> <int> <int> <chr> <dbl>
1 1 1 1 .warmup 1
2 1 2 2 .warmup 1
3 1 3 3 .warmup 1
4 1 4 4 .warmup 1
5 1 5 5 .warmup 1
6 1 6 6 .warmup 1
7 1 7 7 .warmup 1
8 1 8 8 .warmup 1
9 1 9 9 .warmup 1
10 1 10 10 .warmup 1
# ℹ 11,990 more rows
# ℹ Use `print(n = ...)` to see more rows
I think a draw_indices
argument for gather_draws
and spread_draws
, possibly, would help a lot.
The github version now has a draw_indices
parameter to gather_draws and spread_draws, let me know if it doesn't do what you need.
Amazing, thank you very much! Tested it and seems to work fine. Initially I had a feeling gather_draws and spread_draws are a bit slower now, but I have also changed my code underway so I assume that's on my side.
Yeah, I wouldn't expect this to affect the speed of those functions in any appreciable way. Glad it helps!
I noticed that I cannot modify reserved variables.
I'm looking for a way that allows me passing information on whether a draw is from warmup or sampling phase on to gather_draws / spread_draws output. I failed. I tried to base this on @mjskay's suggestion in #236 to use posterior's draws formats.
Here is how far I have come (experimental approach, just so I can understand what limits this on the way):
and in my custom draws extraction pipe, in the end I do
so that I get a draws_df object that correctly "understands" (or just prints?) .warmup to be the respective indicator:
Both gather_draws / spread_draws will remove the column, and having had a look on the source code, I see no straigh forward workaround as the reserved names seem hardcoded in many different places (even inconsistently, with gather_variables for instance also reserving .rows by default, which made me hope initially I can just use this one, but appearently in other places .rows is not reserved either - however, interestingly, when using .rows, gather_variables returns the expected result).
Expected behavior: .warmup as another column.
Can somebody think of a workaround that will still allow me to use gather_draws/spread_draws tidy select interface?
As a side note, I also noticed I get an error when coding .warmup as boolean and passing this to gather_draws/spread_draws.