Open kylebennison opened 3 years ago
Yes, this is a good point. We might need some logic such as: if Gm1 = 6 AND Gm2 = 6, then next row is a tiebreak with a dummy (1 or 0) tiebreak column to keep track of this. Not sure if this is the best approach, though.
there's already a field "TB?" which == 1 if it's a tiebreak game. And then all the points within that game are scored 0-0, 1-0, 1-1, etc. in the Pts column.
Also have TbSet (not sure exactly what this tracks... has values of 0, 1, and T), and TBpt which tracks the point # within a tiebreaker game.
Brainstorming here:
So if the score is 40-30, P1_Pt_Lead is 1. If it's 30-40, it's -1. If it's 6-4 in the tiebreak, the lead is 2 and the can_win_game_this_point == 1. Else 0. And it's 1 if it's 40-X (less than 40) or if it's AD, and otherwise it's 0.
This way scoring is kind of uniform across the board, with the exception that in tiebreakers you could have a 6 point lead and not have won the game yet, while in the regular games you can only have a 3 point lead before winning.
This could result in some weird values that aren't real scores you can get in Tennis, like maybe a 40-7.5 if it's 6-1, but at least everything would be on the same scale and weighted the same then.
Let me know what you think about either of these @drewbennison
I think Option 1 makes more sense to me in this case, because I think you said there is also a “IsTiebreak” column in the data already? So maybe adding these additional columns will provide enough data to help the model understand what’s going on. The can win game this point marker should provide a fair bit of useful data for the model, and perhaps it’ll make things less jumpy too when someone goes down 0-15 or something like that.
For Option 2, I’m not sure the repercussions of creating those “fake” tennis scores and if that would lead to weird performance in regular, non-tiebreak games.
On Jul 16, 2021, at 8:05 AM, Kyle Bennison @.***> wrote:
Brainstorming here:
Create a concept of three new fields: P1_Pt_Lead_Behind, P2_Pt_Lead_Behind, and can_win_game_this_point, essentially. So if the score is 40-30, P1_Pt_Lead is 1. If it's 30-40, it's -1. If it's 6-4 in the tiebreak, the lead is 2 and the can_win_game_this_point == 1. Else 0. And it's 1 if it's 40-X (less than 40) or if it's AD, and otherwise it's 0.
This way scoring is kind of uniform across the board, with the exception that in tiebreakers you could have a 6 point lead and not have won the game yet, while in the regular games you can only have a 3 point lead before winning.
If the game is a tiebreaker, "scale" the tiebreaker values to be in line with what the score essentially would be in a regular game. Meaning 6-6 is the equivalent of 40-40, 6-4 is like 40-30, 6-2 is like 40-15, etc. This could result in some weird values that aren't real scores you can get in Tennis, like maybe a 40-7.5 if it's 6-1, but at least everything would be on the same scale and weighted the same then.
Let me know what you think about either of these @drewbennison
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
probably throwing off model
p1_game_points_pre_serve p2_game_points_pre_serve
both don't take into account whether the game is a tiebreaker, so values range from 1-45 and not just 0, 15, 30, 40, etc.
maybe simply add is_tiebreaker into the model, or find a way to handle