roclark / sportsipy

A free sports API written for python
MIT License
479 stars 189 forks source link

NCAAB Boxscore causes IndexError #591

Open ChrisSBouchard opened 3 years ago

ChrisSBouchard commented 3 years ago

Pulling an NCAAB Boxscore gives and IndexError in _parse_record

boxscore = Boxscore('2021-02-17-19-virginia-military-institute')

Should pull the boxscore from the 02/17 VMI game. Throws same exception for any other boxscore index as well.

Traceback (most recent call last): ncaam_scraper.py", line 24, in if name == "main": main() ncaam_scraper.py", line 12, in main print(Boxscore('2021-02-17-19-virginia-military-institute')) sportsipy\ncaab\boxscore.py", line 225, in init self._parse_game_data(uri) sportsipy\ncaab\boxscore.py", line 683, in _parse_game_data value = self._parse_record(short_field, boxscore, index) sportsipy\ncaab\boxscore.py", line 390, in _parse_record return records[index] IndexError: list index out of range

wLfHLGm2UDfNLh commented 3 years ago

Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.

#value = self._parse_record(short_field, boxscore, index)
value = '0-0'

ChrisSBouchard commented 3 years ago

Ahh the HTML change was my initial thought. Luckily I don't need team records; thank you for the fix! :)

ericmk52 commented 3 years ago

Issue is with parsing team records. Appears the HTML has changed. A temporary fix (if you don't need team records) is commenting line 671 of boxscore.py and replacing with a dummy value. I'll leave the parsing issue to the experts since I couldn't figure it out.

#value = self._parse_record(short_field, boxscore, index) value = '0-0'

Hi, Im currently trying to obtain the dataframe_extended for each team but i am getting the "list index out of range" issue. I tried to comment out the line add add '0-0' but the problem still persists. Do you have any other suggestions?

ChrisSBouchard commented 3 years ago

When I went to make the fix, I noticed that the self._parse_record was past line 671. Make sure when you are commenting out value it is the one where it says:

value = self._parse_record(short_field, boxscore, index)

then add:

value = '0-0'

cdhayes commented 3 years ago

Assuming I understand how the whole PyQuery objects works, In my case the index is failing trying to parse the away_record field. The BOXSCORE_SCHEME for awayrecord is: 'div#boxes div[class="sectionheading"] h2' And in the HTML of the records that work the HTML looks like:

<div class="section_heading" id="box-score-basic-cal-state-northridge_sh">
--
  | <span class="section_anchor" id="box-score-basic-cal-state-northridge_link" data-label="Cal State Northridge (1-1)"></span><h2>Cal State Northridge (1-1)</h2>

Which seems to match up to the BOXSCORE_SCHEME for the away_record.

For the boxscores that give me an Index Exception the HTML looks like:

<div class="section_heading assoc_box-score-basic-texas-am-corpus-christi" id="box-score-basic-texas-am-corpus-christi_sh">
--
  | <span class="section_anchor" id="box-score-basic-texas-am-corpus-christi_link" data-label="Texas A&M-Corpus Christi (3-15)"></span><h2>Texas A&M-Corpus Christi (3-15)</h2>

It appears that some additional text (_assocbox-score-basic-texas-am-corpus-christi) gets added into the "section_heading" name - and I can't find a pattern to why or when it happens. -- But maybe somebody smarter than I will know how to tweak the PyQuery to make this work correctly given the two scenarios.

** Update I updated both the away_record and home_record BOXSCORE_SCEME to use a wildcard as part of the selector so it now looks like: `div#boxes div[class="section_heading"] h2` Running some tests now but this does appear to have resolved the IndexException and returns the data expected.

criedel40 commented 3 years ago

Was the updated fixed presented above over pushed out? I am still getting the error. I could apply the fix mentioned above, however, I would rather use the modified/fixed code.

michigandrew commented 3 years ago

I posted this on another issue for the same error. It might be of use to you all.

I just worked my way through this issue -- I believe the format for boxscore pages has changed on sports reference.

@roclark I have it running locally by updating the away_record & home_record parts of the BOXSCORE_SCHEME to div#boxes div[class*="assoc_box-score-basic-"] h2. I also updated _parse_record in boxscore.py to

        records = boxscore(BOXSCORE_SCHEME[field])
        records = [x.text for x in records if x.text != ''] 

        if len(records) > index:
            return records[index]
        else:
            return ''

Not positive this is the correct way to fix the issue, but it's working for me. Great project by the way! I'd been trying to parse manually prior to finding it.

Edit: apologies for the double @, roclark!

alexwisswolf commented 3 years ago

Attempting to fix with #598 based on the suggestion by @cdhayes since it didn't look like anyone had submitted a PR. Happy to change it based on feedback if there's a better option.