msoczi / football_predictions

Predicting the results of matches in European leagues
MIT License
21 stars 13 forks source link
football football-data football-prediction footballpredictor machine-learning multiclass-classification xgboost

Football matches result predictions


For page with results refer to: https://msoczi.github.io/football_predictions/web/index.html


The aim of the project was to create a tool for predicting the results of league matches from the leading European leagues based on data prepared by myself.

The project was implemented from scratch, i.e. it included:

Raw data with match results are downloaded from https://www.football-data.co.uk.
The advantage of the approach is the ability to predict results from any league. But o far, it is possible to predict the results of the first league of the following countries:

Based on the raw data, I created the appropriate characteristics by myself. The full list of variables is available in the file: variables

The XGBoost model was built on a hand-prepared historical sample containing 7210 rows and 354 columns. As the objective function, multi:softprob was used so that the model's output was the probability of assigning observations to each of the 3 classes of match result - H (Home), A (Away), D (Draw).
These probabilities were then used to build a simple decision tree (max_depth = 3) that would allow to categorize individual observations in a rule-based manner, i.e. to predict the final result with simple rules. This procedure allowed for the generalization of the results in such a way that the draw was not too rare. Below is the sheme of decision tree.
tree

Forecasts do not use bookmaker odds.

You can view the results on the site:

https://msoczi.github.io/football_predictions/web/index.html

You can also clone the repository and use it with python.
How to use?

  1. Clone repository.
    git clone https://github.com/msoczi/football_predictions
  2. Create and activate virtual environment for python.
    
    # LINUX:
    python3 -m venv football_preds
    source football_preds/bin/activate

WINDOWS:

python -m venv football_preds football_preds/Scripts/activate

3. Install required packages (in virtual environment!).
```sh
pip install -r requirements.txt
  1. Run the main_script.py from console.
    python scripts/main_script.py <LEAGUE_NAME>

    Then results will be saved to \output_tables for league passed in the argument.