Unneccessary "$" in function "spread_draws_long_"?

hcp4715 commented 3 years ago

Hi, there,

Thanks for this great tool!

I recently tried to used gather_draws to draw all traces with variable names started with b_, so I tried the regex = TRUE option and found it doesn't work. I looked in the script here: https://github.com/mjskay/tidybayes/blob/master/R/spread_draws.R#L324 and found that the regex might be wrong in "spread_drawslong": the $, which index the end of a string, seems unnecessary, at least in my case. I tested the two lines of code (line 324, 325), and found that without $ I can select the right variable names but with "$" it will return all FALSE and cause an error message when using gather_draws.

> all_variables_names <- c("b_intercept", "b_A1B1", "b_A2B1", "b_A2B2")
> variable_names <- 'b_*'
> 
> # as in current tidyverse:
> variable_regex = paste0("^(", paste(variable_names, collapse = "|"), ")$")
> variable_names_index = grepl(variable_regex, all_variables_names) # nothing is selected
> print(variable_names_index)
[1] FALSE FALSE FALSE FALSE
> 
> # update for the regex
> variable_regex2 = paste0("^(", paste(variable_names, collapse = "|"), ")")
> variable_names_index2 = grepl(variable_regex2, all_variables_names)
> print(variable_names_index2)
[1] TRUE TRUE TRUE TRUE

mjskay commented 3 years ago

Hmmm... the $ is there to prevent partial matches, which would be kind of weird to allow in the interface (I don't think people would want spread_draws(b) to match all parameters containing b, for example).

However, I think you might have an error in the regular expression you are passing, as on a similar example I am able to match all parameters starting with b_ (which is the regular expression "b_.*", not "b_*"):

library(tidybayes)
library(brms)

m = brm(mpg ~ hp * cyl, data = mtcars)
m %>% spread_draws(`b_.*`, regex = TRUE)

# A tibble: 4,000 x 7
   .chain .iteration .draw b_Intercept    b_hp b_cyl `b_hp:cyl`
    <int>      <int> <int>       <dbl>   <dbl> <dbl>      <dbl>
 1      1          1     1        63.1 -0.295  -5.59   0.0354  
 2      1          2     2        61.4 -0.262  -6.02   0.0339  
 3      1          3     3        65.3 -0.250  -7.00   0.0340  
 4      1          4     4        61.8 -0.238  -6.29   0.0321  
 5      1          5     5        67.4 -0.270  -6.76   0.0337  
 6      1          6     6        63.9 -0.259  -6.72   0.0352  
 7      1          7     7        52.3 -0.237  -3.90   0.0261  
 8      1          8     8        51.6 -0.245  -3.46   0.0254  
 9      1          9     9        38.4 -0.0112 -2.51  -0.000680
10      1         10    10        54.4 -0.227  -4.58   0.0271  
# ... with 3,990 more rows

The |-based syntax also works:

m %>% spread_draws(`(b_hp|b_cyl|b_hp:cyl)`, regex = TRUE)

# A tibble: 4,000 x 7
   .chain .iteration .draw b_Intercept    b_hp b_cyl `b_hp:cyl`
    <int>      <int> <int>       <dbl>   <dbl> <dbl>      <dbl>
 1      1          1     1        63.1 -0.295  -5.59   0.0354  
 2      1          2     2        61.4 -0.262  -6.02   0.0339  
 3      1          3     3        65.3 -0.250  -7.00   0.0340  
 4      1          4     4        61.8 -0.238  -6.29   0.0321  
 5      1          5     5        67.4 -0.270  -6.76   0.0337  
 6      1          6     6        63.9 -0.259  -6.72   0.0352  
 7      1          7     7        52.3 -0.237  -3.90   0.0261  
 8      1          8     8        51.6 -0.245  -3.46   0.0254  
 9      1          9     9        38.4 -0.0112 -2.51  -0.000680
10      1         10    10        54.4 -0.227  -4.58   0.0271  
# ... with 3,990 more rows

You could also use c() for this without the regex:

m %>% spread_draws(c(b_hp, b_cyl, `b_hp:cyl`))

# A tibble: 4,000 x 7
   .chain .iteration .draw b_Intercept    b_hp b_cyl `b_hp:cyl`
    <int>      <int> <int>       <dbl>   <dbl> <dbl>      <dbl>
 1      1          1     1        63.1 -0.295  -5.59   0.0354  
 2      1          2     2        61.4 -0.262  -6.02   0.0339  
 3      1          3     3        65.3 -0.250  -7.00   0.0340  
 4      1          4     4        61.8 -0.238  -6.29   0.0321  
 5      1          5     5        67.4 -0.270  -6.76   0.0337  
 6      1          6     6        63.9 -0.259  -6.72   0.0352  
 7      1          7     7        52.3 -0.237  -3.90   0.0261  
 8      1          8     8        51.6 -0.245  -3.46   0.0254  
 9      1          9     9        38.4 -0.0112 -2.51  -0.000680
10      1         10    10        54.4 -0.227  -4.58   0.0271  
# ... with 3,990 more rows

Let me know if that helps!

hcp4715 commented 3 years ago

Hi, @mjskay

It helps!!

However, I think you might have an error in the regular expression you are passing, as on a similar example I am able to match all parameters starting with b (which is the regular expression "b.", not "b_")

Thanks a lot for pointing this out! it worked now!

You could also use c() for this without the regex

Yes, I did tried that, but my experimental design is of multiple independent variables, which make this approach a bit slow.

mjskay commented 3 years ago

great, glad to help! :)

mjskay / tidybayes

Unneccessary "$" in function "spread_draws_long_"? #282