jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.18k stars 323 forks source link

Many failing tests #301

Closed tjburch closed 1 year ago

tjburch commented 1 year ago

We've got a bunch of tests that we need to address. Running locally I get:

Guessing it's just like fixtures breaking or something like that. Better to get resolved sooner than later.

BrayanMnz commented 1 year ago

Hi @tjburch, if you can add more details on how to replicate this - maybe I can take a look into this.

also, I think we can improve the contributing.md and create a template for issues in order to make it more clear and concise for those who wants to collaborate to the project.

tjburch commented 1 year ago

Thanks @BrayanMnz. @TheCleric is also starting to look when he has available time, so keep posted to this page.

Clone the repo, run pip install -e . from top level, and then run pytest and it should light up like a Christmas tree. You should be able to see which tests fail and what the error message is

tjburch commented 1 year ago

I figured out at least some of the cases, those that touch bbref. Basically we're throwing a lot of requests their way and they get rate limited. Looking directly at the get_soup output in standings.py

<h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">You are being rate limited</h2>
</header>
<section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
<div class="w-1/2 md:w-full" id="what-happened-section">
<h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.baseball-reference.com) has banned you temporarily from accessing this website.</p>
</div>

Not sure the best solution here. @TheCleric, any suggestions?

smoot618 commented 1 year ago

I figured out at least some of the cases, those that touch bbref. Basically we're throwing a lot of requests their way and they get rate limited. Looking directly at the get_soup output in standings.py

<h2 class="text-gray-600 leading-1.3 text-3xl lg:text-2xl font-light">You are being rate limited</h2>
</header>
<section class="w-240 lg:w-full mx-auto mb-8 lg:px-8">
<div class="w-1/2 md:w-full" id="what-happened-section">
<h2 class="text-3xl leading-tight font-normal mb-4 text-black-dark antialiased" data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.baseball-reference.com) has banned you temporarily from accessing this website.</p>
</div>

Not sure the best solution here. @TheCleric, any suggestions?

Hey!

So, I just ran into this error an you'll need to do a wait condition. Something below should work (it's in a jupyter notebook) as an example using time.sleep(10):

import numpy as np import pandas as pd import time import seaborn as sns import pybaseball as pyball import matplotlib.pyplot as plt import warnings warnings.filterwarnings('ignore')

from pybaseball import * from pybaseball import statcast, utils from pybaseball.plotting import plot_bb_profile

pd.set_option('display.max_columns', None) %matplotlib inline

We get 2 teams from AL conference and 2 teams from the NL conference

def get_team_names(year):

NYY_df = schedule_and_record(year, 'NYY')
STL_df = schedule_and_record(year, 'STL')
BOS_df = schedule_and_record(year, 'BOS')
NYM_df = schedule_and_record(year, 'NYM')

NYY_df_teams = NYY_df.Opp.unique()
NYY_df_teams_list = list(NYY_df_teams)

STL_df_teams = STL_df.Opp.unique()
STL_df_teams_list = list(STL_df_teams)

BOS_df_teams = BOS_df.Opp.unique()
BOS_df_teams_list = list(BOS_df_teams)

NYM_df_teams = NYM_df.Opp.unique()
NYM_df_teams_list = list(NYM_df_teams)

AL_team = NYY_df_teams_list + BOS_df_teams_list
NL_team = STL_df_teams_list + NYM_df_teams_list 

# Since not every team plays every other team, we get opponents from 2 seperate teams 
# and weed out duplicates
all_team = AL_team + NL_team
mlb_teams = set(all_team) 

return mlb_teams

def get_schedule_record_all_teams(year, team_names):

empt_team_schedule_list = []
for team in team_names:
    print(team)
    team_schedule = schedule_and_record(year, team)
    time.sleep(10)
    empt_team_schedule_list.append(team_schedule)

schedule_df = pd.concat(empt_team_schedule_list)

return schedule_df

def main(year):

team_names = get_team_names(year)
schedule_df = get_schedule_record_all_teams(year, team_names)

return team_names, schedule_df

team_names, schedule_df = main(2010)

Honestly, I just ran into this so my account's been temp banned as well, but I'm gonna grab my other computer and attempt with the wait condition.

tjburch commented 1 year ago

The good news is #296 took care of the rate limits (thanks @TheCleric). The bad news is now the FG error in #315 is causing it's own failing tests (see: https://github.com/jldbc/pybaseball/actions/runs/4116782091/jobs/7107420647)

tjburch commented 1 year ago

Closing per #318 and Bryan Peabody's great detective work.