Closed hkenawi closed 2 months ago
If doing this, create a new branch in the repo and when done developing, send a PR into main subject to review
With regards to the training/testing split, I believe we can train on all the current data we have and just test on 2024 results. This would give us just below 20% testing data which is perfectly fine + simplifies the process of feature engineering.
The following have had their base infrastructure built and can be considered near completion/completed: Consolidated FBRef Player Offensive Historical Data Consolidated FBRef Player Match Log Data
Player-level data processing complete. Team-level data processing complete.
Closing issue.
Clean each stacked dataset:
Player offensive historical data
Player match-log data
Team defensive data
Team standard and advanced goalkeeper data
Merge all cleaned data into player match-log data by using the necessary keys:
Reminder to split the data into train/test split before interpolation of missing values or feature scaling