Open ChrisSBouchard opened 3 years ago
Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.
#value = self._parse_record(short_field, boxscore, index)
value = '0-0'
Ahh the HTML change was my initial thought. Luckily I don't need team records; thank you for the fix! :)
Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.
#value = self._parse_record(short_field, boxscore, index)
value = '0-0'
Hi, Im currently trying to obtain the dataframe_extended for each team but i am getting the "list index out of range" issue. I tried to comment out the line add add '0-0' but the problem still persists. Do you have any other suggestions?
When I went to make the fix, I noticed that the self._parse_record was past line 671. Make sure when you are commenting out value it is the one where it says:
value = self._parse_record(short_field, boxscore, index)
then add:
value = '0-0'
Assuming I understand how the whole PyQuery objects works, In my case the index is failing trying to parse the away_record field. The BOXSCORE_SCHEME for awayrecord is: 'div#boxes div[class="sectionheading"] h2' And in the HTML of the records that work the HTML looks like:
<div class="section_heading" id="box-score-basic-cal-state-northridge_sh">
--
| <span class="section_anchor" id="box-score-basic-cal-state-northridge_link" data-label="Cal State Northridge (1-1)"></span><h2>Cal State Northridge (1-1)</h2>
Which seems to match up to the BOXSCORE_SCHEME for the away_record.
For the boxscores that give me an Index Exception the HTML looks like:
<div class="section_heading assoc_box-score-basic-texas-am-corpus-christi" id="box-score-basic-texas-am-corpus-christi_sh">
--
| <span class="section_anchor" id="box-score-basic-texas-am-corpus-christi_link" data-label="Texas A&M-Corpus Christi (3-15)"></span><h2>Texas A&M-Corpus Christi (3-15)</h2>
It appears that some additional text (_assocbox-score-basic-texas-am-corpus-christi) gets added into the "section_heading" name - and I can't find a pattern to why or when it happens. -- But maybe somebody smarter than I will know how to tweak the PyQuery to make this work correctly given the two scenarios.
** Update I updated both the away_record and home_record BOXSCORE_SCEME to use a wildcard as part of the selector so it now looks like: `div#boxes div[class="section_heading"] h2` Running some tests now but this does appear to have resolved the IndexException and returns the data expected.
Was the updated fixed presented above over pushed out? I am still getting the error. I could apply the fix mentioned above, however, I would rather use the modified/fixed code.
I posted this on another issue for the same error. It might be of use to you all.
I just worked my way through this issue -- I believe the format for boxscore pages has changed on sports reference.
@roclark I have it running locally by updating the away_record
& home_record
parts of the BOXSCORE_SCHEME
to div#boxes div[class*="assoc_box-score-basic-"] h2
. I also updated _parse_record
in boxscore.py to
records = boxscore(BOXSCORE_SCHEME[field])
records = [x.text for x in records if x.text != '']
if len(records) > index:
return records[index]
else:
return ''
Not positive this is the correct way to fix the issue, but it's working for me. Great project by the way! I'd been trying to parse manually prior to finding it.
Edit: apologies for the double @, roclark!
Attempting to fix with #598 based on the suggestion by @cdhayes since it didn't look like anyone had submitted a PR. Happy to change it based on feedback if there's a better option.
Pulling an NCAAB Boxscore gives and IndexError in _parse_record
boxscore = Boxscore('2021-02-17-19-virginia-military-institute')
Should pull the boxscore from the 02/17 VMI game. Throws same exception for any other boxscore index as well.
Traceback (most recent call last): ncaam_scraper.py", line 24, in
if name == "main": main()
ncaam_scraper.py", line 12, in main
print(Boxscore('2021-02-17-19-virginia-military-institute'))
sportsipy\ncaab\boxscore.py", line 225, in init
self._parse_game_data(uri)
sportsipy\ncaab\boxscore.py", line 683, in _parse_game_data
value = self._parse_record(short_field, boxscore, index)
sportsipy\ncaab\boxscore.py", line 390, in _parse_record
return records[index]
IndexError: list index out of range