Open monokizsolt opened 1 year ago
Thanks Zsolt - it looks like the optimiser is coming up with a value for rho that is breaking Dixon and Cole's adjustment factor. I suspect it's because you're using quite a small amount of data so the model is not converging well and so the optimiser's output is quite volatile.
Adding in the previous season's data as well helps the model converge better.
df = pd.concat(
[
pb.scrapers.FootballData("GRC Super League", "2021-2022").get_fixtures(),
pb.scrapers.FootballData("GRC Super League", "2022-2023").get_fixtures(),
]
)[:-2]
weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()
print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))
df = pd.concat(
[
pb.scrapers.FootballData("GRC Super League", "2021-2022").get_fixtures(),
pb.scrapers.FootballData("GRC Super League", "2022-2023").get_fixtures(),
]
)[:-1]
weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()
print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))
I'll look into adding constraints around the value that rho is allowed to be to help minimise this in the future
Hi,
I have noticed that there is a dramatic difference in prediction results when training the dixon model with almost the same amount of data. Traning with the first 99 rows outputs this: Home Win: 0.4901944888036056 Draw: 0.4236429709276788 Away Win: 0.08616254025982717
But training with the first 100 (it even has a negative probability): Home Win: 0.37407906289002624 Draw: 0.6979058936975158 Away Win: -0.07198495669064632
I have prepared a small script to demonstrate this: ` import penaltyblog as pb
fb = pb.scrapers.FootballData("GRC Super League", "2022-2023")
Train with 99
df = fb.get_fixtures().iloc[:99] print(df) weight = pb.models.dixon_coles_weights(df["date"], 0.001) clf = pb.models.DixonColesGoalModel( df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight ) clf.fit()
print(clf) print(clf.predict("Olympiakos", "Asteras Tripolis"))
Train with 100
df = fb.get_fixtures().iloc[:100] print(df) weight = pb.models.dixon_coles_weights(df["date"], 0.001) clf = pb.models.DixonColesGoalModel( df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight ) clf.fit()
print(clf) print(clf.predict("Olympiakos", "Asteras Tripolis"))
`
I could not find why this happens, could you maybe take a look? Thanks, Zsolt