proback / BeyondMLR

Repo for January 2021 version of Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R. The rendered version can be found at: https://bookdown.org/roback/bookdown-BeyondMLR/
165 stars 135 forks source link

Chapters 9 & 11 Fail to Knit #10

Open alex-gable opened 3 years ago

alex-gable commented 3 years ago

Problem

Chapter 9 and 11 are failing to knit due to changes in dplyr(1.0) or broom(0.7)

I consulted this StackOverflow post for guidance (and provided my own solution) in order to solve the error which occurs in the below locations.

- https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/09-Two-Level-Longitudinal-Data.Rmd#L400 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/09-Two-Level-Longitudinal-Data.Rmd#L407 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/09-Two-Level-Longitudinal-Data.Rmd#L414 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/09-Two-Level-Longitudinal-Data.Rmd#L461 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/09-Two-Level-Longitudinal-Data.Rmd#L468 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/09-Two-Level-Longitudinal-Data.Rmd#L475 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/11-Generalized-Linear-Multilevel-Models.Rmd#L523 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/11-Generalized-Linear-Multilevel-Models.Rmd#L628

Example Solution

Looking at the documentation for do(), it appears to have been superseded with a recommendation to use nest_by(). Conveniently, the documentation examples cover almost this exact use case (see details)

```r # do() with named arguments becomes nest_by() + mutate() & list() models <- by_cyl %>% do(mod = lm(mpg ~ disp, data = .)) # -> models <- mtcars %>% nest_by(cyl) %>% mutate(mod = list(lm(mpg ~ disp, data = data))) models %>% summarise(rsq = summary(mod)$r.squared) # use broom to turn models into data models %>% do(data.frame( var = names(coef(.$mod)), coef(summary(.$mod))) ) # -> if (requireNamespace("broom")) { models %>% summarise(broom::tidy(mod)) } ```
% nest_by(schoolid) %>% mutate(fit = list(lm(MathAvgScore ~ year08, data=data))) sd_filter <- smallchart.long %>% group_by(schoolid) %>% summarise(sds = sd(MathAvgScore)) regressions <- regressions %>% right_join(sd_filter, by="schoolid") %>% filter(!is.na(sds)) lm_info1 <- regressions %>% summarise(tidy(fit)) %>% ungroup() %>% select(schoolid, term, estimate) %>% spread(key = term, value = estimate) %>% rename(rate = year08, int = `(Intercept)`) lm_info2 <- regressions %>% summarise(tidy(fit)) %>% ungroup() %>% select(schoolid, term, std.error) %>% spread(key = term, value = std.error) %>% rename(se_rate = year08, se_int = `(Intercept)`) lm_info <- regressions %>% summarise(glance(fit)) %>% ungroup() %>% select(schoolid, r.squared, df.residual) %>% inner_join(lm_info1, by = "schoolid") %>% inner_join(lm_info2, by = "schoolid") %>% mutate(tstar = qt(.975, df.residual), intlb = int - tstar * se_int, intub = int + tstar * se_int, ratelb = rate - tstar * se_rate, rateub = rate + tstar * se_rate) ``` This solution can nearly be line-for-lined copy for the errors occurring on lines 461-475. Chapter 9 also has an issue [here](https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/09-Two-Level-Longitudinal-Data.Rmd#L1264) knitting due to failure to converge. Using 500 iterations seemed to do the trick: ```r hcs.lme=lme(MathAvgScore ~ year08 * charter, chart.long, random = ~ 1 | schoolid, na.action=na.exclude, correlation=corCompSymm(form = ~ 1 |schoolid), weights=varIdent(form = ~1|year08), control = lmeControl(msMaxIter=500)) summary(hcs.lme) # Linear mixed-effects model fit by REML # Data: chart.long # AIC BIC logLik # 10299.2 10348.3 -5140.6 # # Random effects: # Formula: ~1 | schoolid # (Intercept) Residual # StdDev: 0.002264717 6.534915 # # Correlation Structure: Compound symmetry # Formula: ~1 | schoolid # Parameter estimate(s): # Rho # 0.8209145 # Variance function: # Structure: Different standard deviations per stratum # Formula: ~1 | year08 # Parameter estimates: # 0 1 2 # 1.000000 1.127902 1.079423 # Fixed effects: MathAvgScore ~ year08 * charter # Value Std.Error DF t-value p-value # (Intercept) 652.3347 0.2828597 1113 2306.2126 0.0000 # year08 1.1831 0.0907869 1113 13.0320 0.0000 # charter -5.9106 0.8611940 616 -6.8633 0.0000 # year08:charter 0.8316 0.3032040 1113 2.7426 0.0062 # Correlation: # (Intr) year08 chartr # year08 -0.208 # charter -0.328 0.068 # year08:charter 0.062 -0.299 -0.308 # # Standardized Within-Group Residuals: # Min Q1 Med Q3 Max # -4.9760770 -0.4490767 0.0865079 0.5669240 3.0970658 # # Number of Observations: 1733 # Number of Groups: 618 hcs.lme$modelStruct # reStruct parameters: # schoolid # -7.967465 # corStruct parameters: # [1] 1.998216 # varStruct parameters: # [1] 0.1203593 0.0764270 anova(hcs.lme,cs.lme) # hcs not converging here # Model df AIC BIC logLik Test L.Ratio p-value # hcs.lme 1 9 10299.20 10348.30 -5140.600 # cs.lme 2 7 10315.94 10354.13 -5150.973 1 vs 2 20.74528 <.0001 ``` Finally, in Chapter 11, there's a missing `library(broom)` and a handful of unscoped `select()` calls needing `dplyr::` prefixed.
- https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/11-Generalized-Linear-Multilevel-Models.Rmd#L327 - https://github.com/proback/BeyondMLR/blob/e6ebd565a46dcffd1c23c21c408939966f7e3ebe/11-Generalized-Linear-Multilevel-Models.Rmd#L478
Hope this unsolicited help is, well, helpful!
proback commented 3 years ago

Thanks much - this is very helpful! Because we had to freeze R package versions many months ago when the production process started, there will be inevitable issues with package updates. For now, I added a section to the Preface indicating which versions of which packages we used for this edition of the textbook, but I will definitely make your suggested changes in the next edition (or in periodic code updates).

raffaem commented 3 years ago

@alex-gable Can you make a PR with this?

@proback Can we merge this? I am not able to compile the book in PDF

proback commented 3 years ago

@raffaem If you use the package versions listed in the preface are you able to compile the book?

alex-gable commented 3 years ago

@raffaem trying to be mindful/respectful of the fact that this is not my work, I've put the changes I've made in alex-gable/BeyondMLR@b96ab33. the content blocks you're looking for are in chapters 6, 9, 11 in that repo. What's relatively opaque amongst the changes, and only alluded to above, is the addition of new_session = TRUE to render_book in knit.R. The aforementioned change caused some of the above changes I recommended and made in my branch.

@proback want to double emphasize that I want to make sure I'm not breaking any rules in that branch (I've .gitignore'd any bookdown outputs) and would love to contribute back anything I can. let me know if there's anything you'd like me to change

for comparison's sake, I've added my current packages as used in the project. renv might be a super easy way to track these. I used it's dependencies method to do project introspection and compile this list.

napaxton commented 3 years ago

Once you make the changes above (and in Issue #12 ), it almost all works. Having a problem with the following at lines 518-28 in Chapter 11:

regressions <- refdata %>% 
  group_by(game) %>% 
  mutate(fit = list(glm(foul.home ~ foul.diff, family = binomial, 
               data = .))) 

glm_info <- regressions %>%
  summarise(tidy(fit)) %>%
  ungroup() %>%
  dplyr::select(game, term, estimate) %>%
  spread(key = term, value = estimate) %>%
  rename(rate = foul.diff, int = `(Intercept)`)

This causes the following error:

Error: Problem with `summarise()` input `..1`.
x No tidy method recognized for this list.
ℹ Input `..1` is `tidy(fit)`.
ℹ The error occurred in group 1: game = 1.

Major difference would seem to be some difference between lm() and glm()? Any other ideas? And how can we solve it?

napaxton commented 3 years ago

OK, seemed to have solved it, in that the R code will compile. Just needed to follow the rewrite in gable's Chap 11 revs more precisely.

Still having problems with compile from Rmd to HTML/PDF. Throwing up hands for now and returning to this later.

proback commented 2 years ago

I'm sorry for my silence - I've gotten pulled in several other directions over the past year. My new goal is to make a series of corrections and additions that I've been accumulating by the end of January 2023 (when I might actually have a small break to focus on this), possibly using quarto. Feel free to share any other suggestions you'd have before then.