Closed DesmondChoy closed 4 years ago
Hey @DesmondChoy, these are all great. I made updates to the online version so you will see updates in each chapter relating to your points.
Just an FYI - the reproducibility issue was due to the change in sampling procedures for R 3.6.0 (http://bit.ly/35D1SW7). Due to the time it takes to produce the book we have to cache a lot of the code output. Consequently, that chapter was caching the ames train/test split prior to the change in sampling procedures.
Reference date of book: 2019-12-06
Chapter 4: Linear Regression
4.2.2 Inference Notes
(Ctrl-f) "Regresion" & "Remdial"
[4.7 Partial least squares]
I'm not able to replicate n=3 with cv_model_pls$bestTune. I've tried it on two different computers, and I’m getting closer to m=19 or 20. I experimented with tuneLength = 40 and cv_model_pls$bestTune was between 19-21. Given the big discrepancy between m=3 and m=19, I thought I'd flag it out.
After reading this line "Using PLS with m=3 principal components corresponded with the lowest cross-validated RMSE of $29,970", I was wondering how would I go about verifying the RMSE other than looking at the ggplot graph itself.
Suggestion: Consider including the following code to aid the reader in extracting the lowest RMSE for themselves:
Fig 4.10
There's a typo in the caption: The 10-fold cross "valdation" RMSE
Online supplementary material
(https://koalaverse.github.io/homlr/notebooks/04-linear-regression.nb.html), there's a section with repetitive words: (Ctrl-f) “Prediction from a rank-deficient fit…”
Chapter 5: Logistic Regression
5.5 Assessing model accuracy
"There are 16 numeric features in our data set so the following code performs a 10-fold cross-validated PLS model while tuning the number of principal components to use from 1–16. "
Suggestion - Consider including the following code to allow reader to extract number of numeric features for themselves:
Suggestion - Consider including the following code to allow reader to extract lowest RMSE for themselves:
Question - Could you elaborate on what’s the intuition behind limiting tuneLength to number of numeric features? Why can't we set tuneLength to the number of all features?
Chapter 6: Regularized Regression
6.2 Why regularize?
(Ctrl-f) "classicial" (Ctrl-f) bet on sparsity principal - should be "principle"
6.3 Implementation
(Ctrl-f) Here we just peak - should be "peek"
6.4 Tuning
Suggestion - Consider including the following code to allow reader to extract Lasso coefficient for the lowest MSE:
Chapter 7: Multivariate Adaptive Regression Splines
7.5 Feature Interpretation
With the latest version of vip (0.2.1), the following code below gives an warning/error
Suggestion: Code tweaked below.
Thank you!