stephtselios / nhl_roster_design

1 stars 0 forks source link

Questions for roster_design_part_3_player_allocation #1

Open stephtselios opened 7 years ago

stephtselios commented 7 years ago

1) Players that play in multiple positions: I have managed to create a variable that cross examines if a specific player has played multiple positions throughout the duration of a game or a season. I used np.where . The code begins at : In[207].

Player identification is based on number. So when I type:

dm['VPlayer1'] == dm['VPlayer2'], dm['zvp1'] + dm['zvp2']

it might be adding players that have the same number despite team.

How can I add a statement so it can sum the zone starts of players per team only?

2) Player Allocation: I used conditions and choices to allocate players to their roster position In[225]. For each position there are 4 players per team. Can I compare values of each player per team instead of creating hypothetical conditions?

For example: Max value of overall zone start of players per team should be assigned to the 1st line. Min value of overall zone start of players per team should be assigned to the 4th line. Compare values of two left skaters. He who has the highest is assigned to the 2nd line. Something like that.

stephtselios commented 7 years ago

Answers to the questions I posted (roster_design_part_3_player_allocation):

1) For players that play in multiple positions:

tzvp1 = total zone starts for visitor player 1 zvp1 = zone start player 1 zvp2 = zone start player 2

If visitor player 1 played in positon 2, the total zone starts for that player (tzvp1) will be the sum of zone starts in position 1 and 2. I apply this for all 6 positions for both home and away teams.

code: dm['tzvp1'] = np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer2']), dm['zvp1'] + dm['zvp2'], (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer3']), dm['zvp1'] + dm['zvp3'], (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer4']), dm['zvp1'] + dm['zvp4'], (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer5']), dm['zvp1'] + dm['zvp5'], (np.where((dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] == dm['VPlayer6']), dm['zvp1'] + dm['zvp6'], dm['zvp1'])))))))))

Overall zone starts:

Zone starts of each player has been calculated only for his team being home or away for the season, since home zone start value and visitor zone start value were used. The total zone starts of each player is the total of zone starts he participated for a whole season. Thus, the sum of both home and away zone starts.

dm['zplyr1'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer1'] == dm['VPlayer1']), (dm['tzhp1'] + dm['tzvp1'])/dm['gp1'], (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer1'] != dm['VPlayer1']), dm['tzhp1']/dm['thgp3'], (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer1'] == dm['HPlayer1']), (dm['tzvp1'] + dm['tzhp1'])/dm['gp1'], dm['tzvp1']/dm['tvgp1'])))))

I apply this code for all 6 roster positions.

2) ## allocate players per position to forward lines and defensive pairings

First, I generate a column that contains the max value for visitor player 1:

dm['vmax1'] = dm.groupby(['Season', 'VTeamCode'])['tzvp1'].transform(max)

Second, I generate a column that contains the min value for visitor player 1:

dm['vmin1'] = dm.groupby(['Season', 'VTeamCode'])['tzvp1'].transform(min)

Third, if total zone start of a player is not equal to the max or min value of player 1, compare it with the next value that is not min or max.

dm['vc'] = np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp1'] == dm['vmax1']), 1, (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['tzvp1'] == dm['vmin1']), 4, (np.where((dm['Season'] == dm['Season']) & (dm['VTeamCode'] == dm['VTeamCode']) & (dm['VPlayer1'] != dm['VPlayer1']) & (dm['tzvp1'] != dm['vmax1']) & (dm['tzvp1'] != dm['vmin1']) & (dm['tzvp1'].shift() != dm['vmax1']) & (dm['tzvp1'].shift() != dm['vmin1']) & (dm['tzvp1'] > dm['tzvp1'].shift()), 2, 3)))))

I repeat this code for all 6 positions for both home and away teams (12 observations).

games played

dm['vgp1'] = dm.groupby(['Season', 'VTeamCode', 'EventNumber', 'VPlayer1'])['GameNumber'].transform('count')

dm['hgp1'] = dm.groupby(['Season', 'HTeamCode', 'EventNumber', 'HPlayer1'])['GameNumber'].transform('count')

overall player allocation

Each player has been assigned to their respectful roster position based on his team being home or away for the season. The overall roster position of each player is the mean of both home and away position.

c = centre position vc = visitor centre hc = home centre gp1 = games played for visitor player 1 thgp1 = total home games played tvgp1 = total visitor games played

dm['c'] = np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] == dm['VTeamCode']) & (dm['HPlayer1'] == dm['VPlayer1']), (dm['hc'] + dm['vc'])/dm['gp1'], (np.where((dm['Season'] == dm['Season']) & (dm['HTeamCode'] != dm['VTeamCode']) & (dm['HPlayer1'] != dm['VPlayer1']), dm['hc']/dm['thgp1'], (np.where((dm['Season'] == dm['Season']) &(dm['VTeamCode'] == dm['HTeamCode']) & (dm['VPlayer1'] == dm['HPlayer1']), (dm['vc'] + dm['hc'])/dm['gp1'], dm['vc']/dm['tvgp1'])))))

I apply this for all 6 poisitions: c, rw, lw, dr, dl, g.

I have stored the dm file as csv: "dm.to_csv('player_allocation.csv', index='False', sep=',')"

What is the next step:

a) should I run the whole season data frame and then try to apply the roster model?

b) should I try to run the roster model for the 2 games only? The 2nd game only had one goal so the allocated players are 2 per position.