jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.19k stars 324 forks source link

Inconsistent batting_stats_range() response. #271

Open nicholasg97 opened 2 years ago

nicholasg97 commented 2 years ago

Around mid-day EST I ran batting_stats_range('2022-05-20'), and it only returned 8 rows. Going into debugger mode I was able to grab the raw URL pybaseball was sending to the requests module and it loaded fine multiple times in my browser, I saw 200+ rows of data.

I waited a few hours and it seems to work fine for me now, scraping all of the the rows correctly. Trying other dates now, I'm getting similar inconsistency.

I'm not an expert on the requests module but I believe its returning a response before the page is fully loaded. Has anybody experienced this before?

markspotsthex commented 2 years ago

I'm getting this, too. I've been playing with different date ranges; some work but some don't. I ran:

from pybaseball import batting_stats_range

split = "2022-05-25"
data_before = batting_stats_range("2022-03-31", split)
data_after  = batting_stats_range(split, "2022-06-04")

Both "before" and "after" dataframes quit at Jose Altuve. Weird.

bdilday commented 2 years ago

@markspotsthex this sounds similar to the issue mentioned here https://github.com/jldbc/pybaseball/issues/218 https://github.com/jldbc/pybaseball/pull/223

do you have that update in your version?

4G4M3MN0N commented 2 years ago

I was having similar issues, I would only get 20 rows from pybaseball.league_batting_stats.batting_stats_range() I altered the parser type in batting_stats_range.get_soup() to "html.parser" and I return 544 rows and accents are also presented better. `def get_soup(start_dt: date, end_dt: date) -> BeautifulSoup:

get most recent standings if date not specified

# if((start_dt is None) or (end_dt is None)):
#    print('Error: a date range needs to be specified')
#    return None
url = "http://www.baseball-reference.com/leagues/daily.cgi?user_team=&bust_cache=&type=b&dates=fromandto&fromandto={}.{}&level=mlb&franch=&stat=&stat_value=0".format(start_dt, end_dt)
s = requests.get(url).content
return BeautifulSoup(s, "html.parser")`