lucasmaystre / kickscore

Pairwise comparisons with flexible time-dynamics.
MIT License
57 stars 14 forks source link

parameter selection #6

Closed ioannis12 closed 3 years ago

ioannis12 commented 3 years ago

hello,

first of all, congrats for your work! I was wondering if there is a guide or code for the parameter selection? I have a dataset of online games, with 2000 players and 5000 games, I fit the model as in the nba example but results are much inferior than elo or truskill.

greetings

victorkristof commented 3 years ago

Hi @ioannis12,

Thanks for reaching out! It is true that the hyperparameters have a big influence on the model performance in general.

In our case, we usually run a grid-search (or randomized search) for some ranges of values for all hyperparameters. We select the combination of hyperparameters that gives the highest log-likelihood on the training data only. You can use the model.log_likelihood() function for that.

Does that help?

Victor.

ioannis12 commented 3 years ago

thanks for the prompt reply. yes, I was planning to do that, create a loop and test all the kernels. Not sure, though, what to include, I see in your paper 3 variations (affine +wiener, constant + matern, constant + wiener), but combinations are hundreds, right?

victorkristof commented 3 years ago

Hehe, indeed, there is theoretically an infinite number of combinations, when you besides consider that each kernel comes with some hyperparameters (see Table 6 and 7 in the paper)...

I would suggest to select a small number of kernels that "make sense to you", i.e., that enable you to capture some of your hypotheses and your intuition into the model. I could also suggest to focus on "simple" models first.

For example, if you include a "home advantage" parameter, you could make the hypothesis that there is no clear reason why this parameter should vary over time, and therefore use a constant kernel instead of a fancy combination of different kernels.

I hope this helps!

ioannis12 commented 3 years ago

I see, actually the difficulty about my data is that most players have just 3 - 4 games. I guess some kernels converge faster than others, any ideas which ones to prefer?

amirbachar commented 3 years ago

If I may jump in to the discussion, I suggest using a library such as scikit-optimize (https://scikit-optimize.github.io/stable/) or hyperopt (https://github.com/hyperopt/hyperopt) to find good hyper-parameters more efficiently. Grid search is really expensive, and should most likely be used only in extreme cases.

victorkristof commented 3 years ago

Thanks Amir for your suggestion!

Ioannis:

I guess some kernels converge faster than others, any ideas which ones to prefer?

I don't recall observing different convergence rates for different kernels, so I guess it's really up to your modeling assumptions. But note that if computational efficiency is important to you, you could use our implementation of Kickscore in Go!

And maybe @lucasmaystre would like to comment on this discussion? :)

lucasmaystre commented 3 years ago

@ioannis12 great question overall. Automatic model selection is not yet possible with kickscore but it's something I'd like to add in the future.

Agreed with @victorkristof for now the best you can do is try different configurations and use model.log_likelihood() to select the best performing model (at least this avoids having to do cross-validation).

For the selection of kernels: I think Constant + Exponential is a good starting point and usually gets you 99% of the performance of more complex combinations. And I would always have a Constant-only baseline to see whether time matters at all.

These choices "converge" fast in the sense that there are few hyperparameters -> fewer knobs to turn & they're usually more robust to a wide range of hyperparameter values.

@amirbachar using a hyperparameter-optimization library is indeed a principled way to explore the space of hyperparameter values.

ioannis12 commented 3 years ago

For the selection of kernels: I think Constant + Exponential is a good starting point and usually gets you 99% of the performance of more complex combinations. And I would always have a Constant-only baseline to see whether time matters at all.

that was a good tip! I tried 'constant' + 'exponential' and did a basic grid search and the results improved a lot. They outperform trueskill by a small margin!

tha23rd commented 3 years ago

or example, if you include a "home advantage" parameter

Hi - I don't want to derail discussion too much, but, how would I do the above? I tried to find some simple examples where you model phenomenon like the above but failed to find anything. Or just a nudge in the right direction would be much appreciated :)

lucasmaystre commented 3 years ago

Hi @tha23rd sorry for the delay. Here's a snippet.

import kickscore as ks
model = ks.BinaryModel()
k = ks.kernel.Constant(var=1.0)

# Add items.
model.add_item("A", kernel=k)
model.add_item("B", kernel=k)
model.add_item("home-adv", kernel=k)

# A wins against B in a "home" game.
model.observe(winners=["A", "home-adv"], losers=["B"], t=0.0)

# A wins against B in an "away" game.
model.observe(winners=["A"], losers=["B", "home-adv"], t=0.0)

Hope this helps.

ioannis12 commented 2 years ago

hello,

coming back to the last question, how do you include the home advantage? for example, if you have a team that's winning 55% of the home games, what do you put in the model parameter? 0.55 or some other value?

lucasmaystre commented 2 years ago

Hi @ioannis12 apologies for the belated reply.

The value of the home-advantage parameter is learned as part of the inference process—you simply need to provide a kernel (e.g. constant with a given variance, and usually a variance that is roughly on the same scale as that used for the teams works well).

The fitted value of the home-advantage parameter is usually not very interpretable. It is optimized in such a way that, once all iterms are combined together, the predicted probablities match the outcomes observed in the data as well as possible.