Open nicholasg97 opened 2 years ago
I'm getting this, too. I've been playing with different date ranges; some work but some don't. I ran:
from pybaseball import batting_stats_range
split = "2022-05-25"
data_before = batting_stats_range("2022-03-31", split)
data_after = batting_stats_range(split, "2022-06-04")
Both "before" and "after" dataframes quit at Jose Altuve. Weird.
@markspotsthex this sounds similar to the issue mentioned here https://github.com/jldbc/pybaseball/issues/218 https://github.com/jldbc/pybaseball/pull/223
do you have that update in your version?
I was having similar issues, I would only get 20 rows from pybaseball.league_batting_stats.batting_stats_range() I altered the parser type in batting_stats_range.get_soup() to "html.parser" and I return 544 rows and accents are also presented better. `def get_soup(start_dt: date, end_dt: date) -> BeautifulSoup:
# if((start_dt is None) or (end_dt is None)):
# print('Error: a date range needs to be specified')
# return None
url = "http://www.baseball-reference.com/leagues/daily.cgi?user_team=&bust_cache=&type=b&dates=fromandto&fromandto={}.{}&level=mlb&franch=&stat=&stat_value=0".format(start_dt, end_dt)
s = requests.get(url).content
return BeautifulSoup(s, "html.parser")`
Around mid-day EST I ran batting_stats_range('2022-05-20'), and it only returned 8 rows. Going into debugger mode I was able to grab the raw URL pybaseball was sending to the requests module and it loaded fine multiple times in my browser, I saw 200+ rows of data.
I waited a few hours and it seems to work fine for me now, scraping all of the the rows correctly. Trying other dates now, I'm getting similar inconsistency.
I'm not an expert on the requests module but I believe its returning a response before the page is fully loaded. Has anybody experienced this before?