compose_data throws an error for nested data

mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)

GNU General Public License v3.0

717 stars 59 forks source link

# create fake data n_trials <- 100 n_dots <- 50 trial_mean <-c() dots <- c() resp <- c() for (i in 1:n_trials) { trial_mean[i] <- runif(1, -5, 5) # the ground truth mean position for the dots on each trial dots[i] <- list(trial_mean[i] + rnorm(n_dots, 0, 2)) # simulate a list of n_dots dot positions on each trial resp[i] <- mean(x[[i]]) + rnorm(1, 0, 0.1) # simulate noisy mean judgment on each trial } # turn into tibble df <- data_frame(trial_mean, dots, resp)

This should be fixed on the dev branch now (devtools::install_github("mjskay/tidybayes", ref = "dev")). Here are some minimal examples that should work with x as defined in the model you suggested:

As a list column

This example uses the format you suggested:

df = tibble(
  x = list(1:5, 2:6)
)

df

# A tibble: 2 x 1
  x        
  <list>   
1 <int [5]>
2 <int [5]>

Which now does what is expected with compose_data:

df %>%
  compose_data()

$x
$x[[1]]
[1] 1 2 3 4 5

$x[[2]]
[1] 2 3 4 5 6

$n
[1] 2

Note that the number of columns is not provided automatically because there's no sensible rule for automatically determining the name of that column. But, you can provide it easily by relying on the fact that additional variables passed to compose_data can refer to previously-defined variables, including those automatically generated by compose_data itself. Thus you can simply define m in terms of the number of elements in the first row of x (since the array in this case has the same number of elements in each row):

df %>%
  compose_data(m = length(x[[1]]))

$x
$x[[1]]
[1] 1 2 3 4 5

$x[[2]]
[1] 2 3 4 5 6

$n
[1] 2

$m
[1] 5

As a matrix column

The other option would be to define x as a matrix column. Matrix columns have the restriction that they must have the same number of rows as the data frame, and could be thought of as defining "sub-columns" within the larger data frame. The analog to the above example would be this:

df = tibble(
  x = t(matrix(c(1:5, 2:6), ncol = 2))
)

df

# A tibble: 2 x 1
  x[,1]  [,2]  [,3]  [,4]  [,5]
  <int> <int> <int> <int> <int>
1     1     2     3     4     5
2     2     3     4     5     6

Again, compose_data does what is expected but cannot auto-generate the column index m:

df %>%
  compose_data()

$x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    2    3    4    5    6

$n
[1] 2

But we can define it directly in terms of x:

df %>%
  compose_data(m = ncol(x))

$x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    2    3    4    5    6

$n
[1] 2

$m
[1] 5

Let me know if that does what you need or if there's anything else that might help with this use case.

mjskay / tidybayes

compose_data throws an error for nested data #159