Closed nateschor closed 1 year ago
Summary: In this issue, I grabbed the Lahman dataset and augmented it with lags 1-5 of each of the Lahman variables. I made 1-5 year lag plots and determined that while there is a correlation between current OBP and lags of OBP, lags of OBP do not seem to account for all variation in current OBP (see report/figures). I also determined that while the distribution of OBP in 2020 looks differently from other seasons, it is still correlated with 2019. Adjusting for the shortened 2020 season by calculating OBP / PA
in this plot does not help either and leads to small values that may not be numerically stable (potential issues with matrix inversion)
in #4, I will likely handle NAs by filtering and also determine values of x
and y
for filtering $x > OBP > y$
Stable link here
Merged into main in https://github.com/nateschor/OBP/commit/f6bb1f9dd4cb262d21530a5029f1a65c10d47b64
I want to include more data from the Lahman package, both to have more observations for OBP and for more potential predictors. Afterwards, it is time for EDA: