Open utterances-bot opened 3 years ago
Did you consider accounting for double headers?
There are so few double headers I don't think it would make a difference, but I'm curious
I've experimented with a few features that target double headers. They are easy to identify in the training data - the last number in the game_id changes on baseball-reference.com.
The problems comes on the prediction side. It's hard to figure out which game is which when looking at today's games, and the pitching lineup is also sometimes wonky. Same is true when placing bets - it's easy to bet on the wrong game. I've done that about 5 times. So I swore them off.
I'm sure it's all solvable, but I haven't dived deep because it looks like a bunch of custom code for not a lot of games.
After the encode_me step, all the string parameters in the df (e.g., home team abbreviation) become NaN. How can I fix this?
Also interested in how I could use the data in the df to predict over-unders
I'm getting a call back error when running this section: df = game_df
df = pd.merge(left=df, right = get_diff_df(batting_df, 'batting'), on = 'game_id', how='left') print(df.shape)
df = pd.merge(left=df, right = get_diff_df(pitching_df, 'pitching'), on = 'game_id', how='left') print(df.shape)
df = pd.merge(left=df, right = get_diff_df(pitcher_df, 'pitcher',is_pitcher=True), on = 'game_id', how='left') df.shape
any thoughts on what's wrong? I just sent you a twitter message as well.
Did you ever figure out the call back error? Thank you.
Did you ever figure out the call back error? Thank you.
Nope. Never got a response. If you figure it out please let me know.
Incredible series. Also learning a lot. @JayDoubleOh7 in the event you catch this the issue is has to do with .astype(np.timedelta64)) you need to declare format. adding .astype(np.timedelta64(0,'s'))) should get the code working again; I myself am just learning, but this should allow for continued study. Good luck everyone!
I get same error as JayDoubleOh7. " ValueError: datetime64/timedelta64 must have a unit specified" Please reply and help with how to fix this. I tried @stannis-Analysis suggestion but could not get it to work
Having a lot of errors here: import datetime as dt game_data = [] for link in game_links: url = 'https://www.baseball-reference.com' + link game_data.append(process_link(url)) if len(game_data)%1000==0: print(dt.datetime.now().time(), len(game_data))
anyone?
Incredible series. Also learning a lot. @JayDoubleOh7 in the event you catch this the issue is has to do with .astype(np.timedelta64)) you need to declare format. adding .astype(np.timedelta64(0,'s'))) should get the code working again; I myself am just learning, but this should allow for continued study. Good luck everyone!
I have TypeError: incompatible index of inserted column with frame index
Anyone figure out how to get rid of the errors here?
import datetime as dt game_data = [] for link in game_links: url = 'https://www.baseball-reference.com' + link game_data.append(process_link(url)) if len(game_data)%1000==0: print(dt.datetime.now().time(), len(game_data))
Still looking to figure out this issue
import datetime as dt game_data = [] for link in game_links: url = 'https://www.baseball-reference.com' + link game_data.append(process_link(url)) if len(game_data)%1000==0: print(dt.datetime.now().time(), len(game_data))
---------------------------------------------------------------------------
IndexError Traceback (most recent call last) Cell In[62], line 5 3 for link in game_links: 4 url = 'https://www.baseball-reference.com/' + link ----> 5 game_data.append(process_link(url)) 6 if len(game_data)%1000==0: print(dt.datetime.now().time(), len(game_data))
Cell In[60], line 45 41 uncommented_html += h + '\n' 43 soup = bs(uncommented_html) 44 data = { ---> 45 'game': get_game_summary(soup, game_id), 46 'away_batting': get_table_summary(soup, 1), 47 'home_batting':get_table_summary(soup, 2), 48 'away_pitching':get_table_summary(soup, 3), 49 'home_pitching':get_table_summary(soup, 4), 50 'away_pitchers': get_pitcher_data(soup, 3), 51 'home_pitchers': get_pitcher_data(soup, 4) 52 } 53 return data
Cell In[60], line 7 5 scorebox = soup.find('div', {'class':'scorebox'}) 6 teams = scorebox.findAll('a',{'itemprop':'name'}) ----> 7 game['away_team_abbr'] = teams[0]['href'].split('/')[2] 8 game['home_team_abbr'] = teams[1]['href'].split('/')[2] 9 meta = scorebox.find('div', {'class':'scorebox_meta'}).findAll('div')
IndexError: list index out of range
HAS ANYONE FIGURE OUT THIS ERROR? I GOT STUCK AT A FEW STEPS USING MY LOCAL SERVER SO I SWITCH TO COLAB AND THE AI CANT EVEN FIGURE THIS OUT. HOPEFULLY SOMEONE REACH OUT AFTER SEEING THIS
import datetime as dt game_data = [] for link in game_links: url = 'https://www.baseball-reference.com' + link game_data.append(process_link(url)) if len(game_data)%1000==0: print(dt.datetime.now().time(), len(game_data))
MLB Training Data | rdpharr’s projects
Part 2 - Downloading the data and training the model with XGBoost
https://rdpharr.github.io/project_notes/baseball/webscraping/xgboost/brier/accuracy/calibration/machine%20learning/2020/09/21/MLB-Part2-First-Model.html