Open andony-arrieula opened 8 months ago
It seems it's not the data not in the good order, but the preprocessing done in preprocess_dataset which is not performed on the prediction data.
This should happen only during Cross-Validation process and it's perfectly normal. The idea is that we randomly selected a train set and a n evaluation set, to measure the performance of the models.
However, during the training period (ONLY), it should use the first 100 matches as test and the rest of the dataset should be processed in the correct order! Otherwise, can you print the results of the preprocess_dataset for the evaluation data to verify this?
The problem is not on the evaluation data, but on the prediction data passed to the predict match dialog.
For example, I launch my trained model on Ligue 1 French league, I select Paris SG as home team (the best team in the league) and Clermont as away team (the worst team in the league), and I enter the same odd (3.00) for all possible results, the algorithm gives me probabilities of 0.32 for Paris 0.23 a draw and 0.44 for Clermont, which is totally incoherent, and this is because the data given to the model is not processed before it was passed to the model.
So the program grabs the features from the history tables, but does not preprocess them beforing passing them to model for prediction?
Exactly !
But the program also does not update the statistics with the data of the previous match !
Thanks. I will take a look into it.
I am also looking on that issue on my own side, I think the best way to proceed is to modify the construct_input() method to process the data before giving it to the model to make the prediction.
Yeah, I think that would be the best way. The rows should be processed before returned using the scaler of them model's config (if any)
Any update on this issue - how was it solved in the end? @andony-arrieula did you do a PR or fork with the changes?
There are 2 errors in the match prediction section:
The match prediction is not usable then.