oseymour / ScraperFC

Python package for scraping soccer data from a variety of sources
GNU General Public License v3.0
222 stars 49 forks source link

FBRef scrape error - AttributeError error: 'NoneType' object has no attribute 'split'. #7

Closed jmarqu18 closed 1 year ago

jmarqu18 commented 1 year ago

Good afternoon!

I am trying to test the library to learn web scrapping techniques. I have started to test with the code examples presented in the "code" folder and for Understat there is no problem, but it fails me with all the FBRef functions.

Screenshot_1

In all of them we get the same AttributeError error: 'NoneType' object has no attribute 'split'.

image

However, with Understat there is no error.

image

hedonistrh commented 1 year ago

I am not maintainer but I am using this package often. I am able to re-produce the issue with following command as well

import traceback
import ScraperFC as sfc
scraper = sfc.FBRef()
try:
    out = scraper.scrape_league_table(2021, 'EPL')
except:
    traceback.print_exc()

Looks like they did not change design of website recently. So I am not sure what is the cause of that but will try to find as well. 🤔

hedonistrh commented 1 year ago

I feel like that can be related with "cache-accept" banner though. I tried to open that as headless = False and can see that banner. That caused some problems with different websites for me before. I am checking that is it possible to get rid of that in the FBRef module.

hedonistrh commented 1 year ago

I upgraded package after latest commit of @oseymour. Still we have a problem bıt this time, error message is different. Currently I do not have enough time to check what is going on but will come to that as I need to use package for that purpose within few days. 😅

Scraping 2021 EPL league table
Traceback (most recent call last):
  File "<ipython-input-1-8443cae1b34f>", line 5, in <module>
    out = scraper.scrape_league_table(2021, 'EPL')
  File "/usr/local/lib/python3.9/site-packages/ScraperFC/FBRef.py", line 192, in scrape_league_table
    df = pd.read_html(url)
  File "/usr/local/lib/python3.9/site-packages/pandas/util/_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pandas/io/html.py", line 1086, in read_html
    return _parse(
  File "/usr/local/lib/python3.9/site-packages/pandas/io/html.py", line 898, in _parse
    tables = p.parse_tables()
  File "/usr/local/lib/python3.9/site-packages/pandas/io/html.py", line 217, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "/usr/local/lib/python3.9/site-packages/pandas/io/html.py", line 736, in _build_doc
    raise e
  File "/usr/local/lib/python3.9/site-packages/pandas/io/html.py", line 717, in _build_doc
    with urlopen(self.io) as f:
  File "/usr/local/lib/python3.9/site-packages/pandas/io/common.py", line 137, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python@3.9/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
oseymour commented 1 year ago

Hey @jmarqu18 and @hedonistrh. I've been busy and never got around to checking this. So sorry! The whole FBRef module is having issues right now, they changed some stuff on their end. I'm going to get around to fixes this week and next. Thanks!

oseymour commented 1 year ago

@jmarqu18 and @hedonistrh can you both try running these now? Both function calls worked for me.

FBRef changed the URLs for EPL and Ligue 1 at some point this spring/summer and a fix was included in my last push. So can you update and let me know if it works for you, please?

hedonistrh commented 1 year ago

Hey @oseymour, no problem. I can easily relate with how hard to keep everything is working while they are constantly changing their website. Yes, they are working right now. That is not my "issue" but feel free to close that 🤞

Good luck with your works. 💛

oseymour commented 1 year ago

@jmarqu18 I'll close this once you confirm it's working on your machine or if I don't hear back in a few days.

jmarqu18 commented 1 year ago

Hi @oseymour, thank you very much for the reply and for the attempted solution. But sorry about that, I just ran it again and I still get the same error :(

image

oseymour commented 1 year ago

@jmarqu18 you upgraded to the latest version right (I hate to ask but I've definitely forgotten to update myself before haha)?

jmarqu18 commented 1 year ago

hahaha, don't worry, you were right to ask, sorry!

Now I have updated and the error has changed, it still doesn't work for me.

image

Can I help you? Do you need more specific information?

oseymour commented 1 year ago

Oh man, I've never seen that bottom error, which makes me think it's being caused by the top error. Which I've also never seen. Can you try updating Google chrome?

oseymour commented 1 year ago

@jmarqu18 any updates?