Closed kalealex closed 5 years ago
This should be fixed on the dev branch now (devtools::install_github("mjskay/tidybayes", ref = "dev")
). Here are some minimal examples that should work with x
as defined in the model you suggested:
As a list column
This example uses the format you suggested:
df = tibble(
x = list(1:5, 2:6)
)
df
# A tibble: 2 x 1
x
<list>
1 <int [5]>
2 <int [5]>
Which now does what is expected with compose_data
:
df %>%
compose_data()
$x
$x[[1]]
[1] 1 2 3 4 5
$x[[2]]
[1] 2 3 4 5 6
$n
[1] 2
Note that the number of columns is not provided automatically because there's no sensible rule for automatically determining the name of that column. But, you can provide it easily by relying on the fact that additional variables passed to compose_data
can refer to previously-defined variables, including those automatically generated by compose_data itself. Thus you can simply define m
in terms of the number of elements in the first row of x
(since the array in this case has the same number of elements in each row):
df %>%
compose_data(m = length(x[[1]]))
$x
$x[[1]]
[1] 1 2 3 4 5
$x[[2]]
[1] 2 3 4 5 6
$n
[1] 2
$m
[1] 5
As a matrix column
The other option would be to define x
as a matrix column. Matrix columns have the restriction that they must have the same number of rows as the data frame, and could be thought of as defining "sub-columns" within the larger data frame. The analog to the above example would be this:
df = tibble(
x = t(matrix(c(1:5, 2:6), ncol = 2))
)
df
# A tibble: 2 x 1
x[,1] [,2] [,3] [,4] [,5]
<int> <int> <int> <int> <int>
1 1 2 3 4 5
2 2 3 4 5 6
Again, compose_data
does what is expected but cannot auto-generate the column index m
:
df %>%
compose_data()
$x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
$n
[1] 2
But we can define it directly in terms of x
:
df %>%
compose_data(m = ncol(x))
$x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 6
$n
[1] 2
$m
[1] 5
Let me know if that does what you need or if there's anything else that might help with this use case.
When trying to prepare a dataframe with a nested column, the tidybayes function compose_data throws an error. I'm guessing this means that compose_data is not set up to handle nested data which would be read into multidimensional data types in Stan. For example, lists of length m nested inside each of n rows of a dataframe might feed into any of the following data structures.
Here's an illustrative example with some fake data.
Imagine an experiment where an observer is shown a set of dots arranged on a number line and asked to eyeball their mean. In this simple experiment, our independent variable is the position of each of our dots on each trial (i.e., two dimensions: trials * dots), and our responding variable is the participant's response on each trial (i.e., one dimension: trials). We can simulate responses as means with random noise added.
Now we call the tidybayes function compose_data to prepare the data for modeling in Stan.
However, this throws the following error.
Error in is.list(val) : argument is of length zero
Should compose_data be able to handle nested data like this?
I think this might be related to issue 157, but I think my example and description of the problem are a little clearer. Please let me know if you have any questions about this issue or the example provided.