This is an issue that could perhaps be split into two related parts, both are about nested tables:
the output of unnesting differs depending on the dimensions of the data frame to be unnested
spreading of data frames with nested columns returns an error.
In the following reproducible example I tried to give some context with a situation where one might encounter this. Specifically, in the example I try to summarise the output from different linear models that were based on different formulas. Perhaps there is a better way to do the same thing?
library("dplyr")
library("tidyr")
library("purrr")
library("broom")
data <- data_frame(
name = c("Alex", "Alex", "Alex", "Tim", "Tim", "Tim"),
year = c(1990, 1991, 1992, 1990, 1991, 1992),
height = c(160, 165, 170, 120, 134, 150),
weight = c(50, 52, 53, 48, 48, 52)
)
data
## name year height weight
## (chr) (dbl) (dbl) (dbl)
## 1 Alex 1990 160 50
## 2 Alex 1991 165 52
## 3 Alex 1992 170 53
## 4 Tim 1990 120 48
## 5 Tim 1991 134 48
## 6 Tim 1992 150 52
# nest
data <- nest(data, year, height, weight)
data
## name data
## (chr) (chr)
## 1 Alex <tbl_df [3,3]>
## 2 Tim <tbl_df [3,3]>
Here we build two different linear models (model_A and model_B) that differ in that the former will estimate the intercept and the slope and the latter only the slope.
# build two different linear models
data <- data %>%
mutate(
model_A = map(.$data, ~lm(year ~ height, data = .)),
model_B = map(.$data, ~lm(year ~ height + 0, data = .))
)
data
## name data model_A model_B
## (chr) (chr) (chr) (chr)
## 1 Alex <tbl_df [3,3]> <S3:lm> <S3:lm>
## 2 Tim <tbl_df [3,3]> <S3:lm> <S3:lm>
When the data frame is unnested on a particular column, but contains multiple other nested columns, then there seems to be a different behaviour depending on the number of rows of the nested data frames in the column to be unnested.
In case there is more than one row (tidy_model_A - where we have a slope and intercept), the other nested columns are dropped. However, in case there is only one row (tidy_model_B), the other nested columns are not dropped.
Is this behaviour designed? Sometimes it is nice to test different models (e.g. fixing the intercept) and then this difference in behaviour makes it difficult to do it programmatically.
# here it works as
tidy_model_A <- data %>%
mutate(tidy = map(model_A, tidy)) %>%
unnest(tidy)
tidy_model_A
## name term estimate std.error statistic p.value
## (chr) (chr) (dbl) (dbl) (dbl) (dbl)
## 1 Alex (Intercept) 1.958000e+03 9.557251e-12 2.048706e+14 3.107423e-15
## 2 Alex height 2.000000e-01 5.790501e-14 3.453933e+12 1.843174e-13
## 3 Tim (Intercept) 1.982036e+03 3.464698e-01 5.720659e+03 1.112843e-04
## 4 Tim height 6.656805e-02 2.562205e-03 2.598076e+01 2.449142e-02
tidy_model_B <- data %>%
mutate(tidy = map(model_B, tidy)) %>%
unnest(tidy)
tidy_model_B
## name data model_A model_B term estimate std.error statistic p.value
## (chr) (chr) (chr) (chr) (chr) (dbl) (dbl) (dbl) (dbl)
## 1 Alex <tbl_df [3,3]> <S3:lm> <S3:lm> height 12.05941 0.2074858 58.12160 0.0002958912
## 2 Tim <tbl_df [3,3]> <S3:lm> <S3:lm> height 14.66374 0.9394218 15.60932 0.0040791367
For the second part to this issue, when I gather the unnested data frame (tidy_model_A - where the other nested columns were dropped), I can spread again. However in case they were not dropped (in tidy_model_B), then the gather works, but the spreading breaks.
Is it expected that spreading should not work for data frames with nested columns?
This is an issue that could perhaps be split into two related parts, both are about nested tables:
In the following reproducible example I tried to give some context with a situation where one might encounter this. Specifically, in the example I try to summarise the output from different linear models that were based on different formulas. Perhaps there is a better way to do the same thing?
Here we build two different linear models (
model_A
andmodel_B
) that differ in that the former will estimate the intercept and the slope and the latter only the slope.When the data frame is unnested on a particular column, but contains multiple other nested columns, then there seems to be a different behaviour depending on the number of rows of the nested data frames in the column to be unnested.
In case there is more than one row (tidy_model_A - where we have a slope and intercept), the other nested columns are dropped. However, in case there is only one row (tidy_model_B), the other nested columns are not dropped.
Is this behaviour designed? Sometimes it is nice to test different models (e.g. fixing the intercept) and then this difference in behaviour makes it difficult to do it programmatically.
For the second part to this issue, when I gather the unnested data frame (
tidy_model_A
- where the other nested columns were dropped), I can spread again. However in case they were not dropped (intidy_model_B
), then the gather works, but the spreading breaks.Is it expected that spreading should not work for data frames with nested columns?
Should this not return something like this:
My
sessionInfo()
is below: