markjacksonfishing / sf_giants_stats

Join me in celebrating my love for the San Francisco Giants! I've crafted a Go program to combine my love for data science and the SF Giants. With this program, you can enter any Major League Baseball team's abbreviation and receive their 2022 season batting stats from Baseball Reference.
MIT License
1 stars 0 forks source link

Include additional factors to improve the prediction accuracy #6

Closed markjacksonfishing closed 1 year ago

markjacksonfishing commented 1 year ago

Description

This pull request proposes to improve the accuracy of the current program that uses linear regression analysis to predict the number of wins for a given baseball team based on its performance statistics in the previous season. By including additional factors such as pitching statistics and defensive metrics in the regression analysis, we can improve the accuracy of the model.

Changes Made

Added a section to the README.md file explaining the importance of including additional factors and using more sophisticated regression techniques to improve the accuracy of the program. Added a new function to the program that scrapes pitching statistics and defensive metrics from the team's page on Baseball Reference. Modified the existing regression analysis function to include the new variables in the analysis. Updated the README.md file with instructions on how to use the updated program.

Expected Outcome

By including additional factors and using more sophisticated regression techniques, we can improve the accuracy of the program in predicting the number of wins for a given baseball team. This can help analysts make better predictions and inform decision-making in various fields, including sports analytics.

Additional Information

It is important to consider the trade-off between model complexity and prediction accuracy. Adding too many variables can lead to overfitting and reduced predictive power, while too few variables can result in an oversimplified model that does not capture all the relevant factors. Therefore, we carefully selected the variables to include in the model based on their statistical significance and practical relevance.

Related Issue

This pull request fixes #5 .