Hmmm, so I am comfortable with the code as it is, but you may have found a separate issue on finding accurate regression (which I'll raise as a small issue as well).
I am taking two deliberate steps -- splitting my input/feature (x - epa) into training and test sets, and my target variables (y) into training and test sets. Training includes all but the last 100 entries; test includes the last 400 entries. So it's possible there's some overlap. (.reshape(-1,1) simply turns this into a 2D array/one column, which I need for the regression analysis.
The issue is if there's overlap it can cause bias in the numbers. So I need to relook at why I made the split this way. I'd recommend against constants because the slice will be impacted by the data set size (number of rows in the excel).
I am taking two deliberate steps -- splitting my input/feature (x - epa) into training and test sets, and my target variables (y) into training and test sets. Training includes all but the last 100 entries; test includes the last 400 entries. So it's possible there's some overlap. (.reshape(-1,1) simply turns this into a 2D array/one column, which I need for the regression analysis.
The issue is if there's overlap it can cause bias in the numbers. So I need to relook at why I made the split this way. I'd recommend against constants because the slice will be impacted by the data set size (number of rows in the excel).
_Originally posted by @vinostroud in https://github.com/vinostroud/nfl_analytics/pull/18#discussion_r1609024159_