shokru / mlfactor.github.io

Website dedicated to a book on machine learning for factor investing
198 stars 95 forks source link

Possible typo in notdata.html #59

Closed fayolle closed 3 years ago

fayolle commented 3 years ago

Toward the end of chapter 1 (notations and data), for the preparation of categorical data:

data_ml <- data_ml %>% 
    group_by(date) %>%                                   # Group by date
    mutate(R1M_Usd_C = R1M_Usd > median(R1M_Usd),        # Create the categorical labels
           R12M_Usd_C = R1M_Usd > median(R12M_Usd)) %>%
    ungroup() %>%
    mutate_if(is.logical, as.factor)

shouldn't it be instead:

data_ml <- data_ml %>% 
    group_by(date) %>%                                   # Group by date
    mutate(R1M_Usd_C = R1M_Usd > median(R1M_Usd),        # Create the categorical labels
           R12M_Usd_C = R12M_Usd > median(R12M_Usd)) %>%
    ungroup() %>%
    mutate_if(is.logical, as.factor)

(i.e. R12M_Usd_C = R12M_Usd instead of R12M_Usd_C = R1M_Usd)

shokru commented 3 years ago

Yes! You are completely right. Luckily, I don't use this variable later on in the book. I think I only use the 1M categorical version (for trees & neural networks). I will update this typo shortly on the online version. Thank you for pointing this out.

fayolle commented 3 years ago

Thank you!