mvparrot / fatigue

Materials for the research project to develop a fatigue index for Grand Slam tennis matches
0 stars 1 forks source link

models #8

Open dicook opened 5 years ago

dicook commented 5 years ago

Looking at the possible models, nothing quite works for what you need. I'll try to make a sketch but will first list the pros/cons

  1. cobs fits the kind of model we want, splines with knots. So between each "dose" a separate curve is fitted, where a knot occurs at each dose. The problem is that it doesn't allow covariates, for example, you would want to have a different set of coefficients allowed if the point differential is close, and close to game point. It should be possible to make the cobs model split on the set automatically rather than you forcing it to, by setting the knots to be at the end of sets.

  2. drc appears to allow you to fit a nice nonlinear function with a dose function as the explanatory variable. This should work similarly to cobs, and may be more flexible. However, it still has the problem of not being able to have covariates.

  3. lme4 can fit nonlinear models, but doesn't easily handle knots, breaks in curves. If you can specify a function form you might be able to use this. It could have problems doing the fit, might not converge, depending on the function you provide. It can handle covariates. Mixed effects will not help you here, just get you distracted. Its not mixed effects that are needed.

An alternative is a bit of a hack model, but it would be flexible. Here's a suggestion:

  1. Split the data into subsets between each changeover of games. You should have at least 4 data points in each set, depending on how many points are played, in the game that the server serves. (That's not really enough! So maybe like you are doing in the cobs.Rmd subsetting to sets might be the only thing possible.)
  2. Do a linear fit on each subset, serve speed is response, explanatory variable is point. (If you have more data you could use an exponential fit, which is what cobs does but lme4 should also.)
  3. Examine the coefficients of each fit against the game number in the match.

(Steph, this is a bit like the many models approach we taught in 1010, on the gap minder data.)

You would expect intercept to decrease over the match, and maybe slope to decrease faster. Other variables like point/game differential or whether it is deuce, or first/second serve, or set score, all could have an effect on the pattern. You could then add these variables as covariates in the model fit.

I would like to compute a new variable that indicates the importance of the point - maybe Steph's function. You would expect that if the server is up 40-30 that they might put a chunk of effort into the serve at that point, but not if they were up 40-0, especially if it is set point. That's the kind of importance that we'd like to have as a variable.

Probably, the cobs approach might be the best at this point. It is also possible to force covariate fitting with a hack too. You simply split the data into subsets based on the covariates, and fit the model separately to each.

huizezhang-sherry commented 5 years ago

Found that problem of co-variation on Friday with Steph and I will follow your suggestion to proceed on Wednesday.

For the importance of the point, I noticed that when Federer has double breaks on the other player's game, he usually strategically slows down the serving speed. I think I will create the new variable based on that.