Hey, I realized that MLS related match scraping has issue. That is coming from MLS has 3 words in it. But in this code-block, we are checking for single or 2 words. So that MLS related one is failing. (BTW, thanks a lot for great comment. 👏 )
To solve that, possibly one more try-except can be added. That is possibly not most clear way but I am not sure are there anything else which solve that way clear. 🤔
To re-produce issue
import ScraperFC as sfc
scraper_fbref = sfc.FBRef()
year = 2021
league_name = "MLS"
example = scraper_fbref.scrape_match(link='https://fbref.com/en/matches/920ed404/Los-Angeles-FC-Minnesota-United-July-28-2021-Major-League-Soccer', year=year, league=league_name)
I did following to solve that issue
# Get date of the match
try:
# Try this first. Assumes league name is one word
date_elements = link.split("/")[-1].split("-")[-4:-1]
date = '-'.join(date_elements)
date = datetime.datetime.strptime(date,'%B-%d-%Y').date()
except:
try:
# Assumes league name is two words
date_elements = link.split("/")[-1].split("-")[-5:-2]
date = '-'.join(date_elements)
date = datetime.datetime.strptime(date,'%B-%d-%Y').date()
except:
# Assumes league name is three words
date_elements = link.split("/")[-1].split("-")[-6:-3]
date = '-'.join(date_elements)
date = datetime.datetime.strptime(date,'%B-%d-%Y').date()
However, this time there is another issue with match-week part. That is coming from how FBRef enter matchweek data though. For instance if you check following links
We can see that one has Matchweek 1 as data. Other one has Regular Season. That is coming from how MLS is built though. As it has playoff etc, that kind of corner-cases are not so surprising 😞
To solve that issue, we can use following. That is not clean at all. But as FBRef does not provide round information for MLS, we may skip fully or keep info about it is regular season or play-off.
if league != "MLS":
matchweek = int(
dom.xpath('//*[@id="content"]/div[2]/div[3]/div[2]/text()')[0]\
.split('Matchweek')[-1]\
.replace(')','')\
.strip()
)
else:
matchweek = dom.xpath('//*[@id="content"]/div[2]/div[3]/div[2]/text()')[0].replace(')','').replace('(', '').strip()
Hey, I realized that MLS related match scraping has issue. That is coming from MLS has 3 words in it. But in this code-block, we are checking for single or 2 words. So that MLS related one is failing. (BTW, thanks a lot for great comment. 👏 )
To solve that, possibly one more try-except can be added. That is possibly not most clear way but I am not sure are there anything else which solve that way clear. 🤔
To re-produce issue
I did following to solve that issue
However, this time there is another issue with match-week part. That is coming from how FBRef enter matchweek data though. For instance if you check following links
We can see that one has Matchweek 1 as data. Other one has Regular Season. That is coming from how MLS is built though. As it has playoff etc, that kind of corner-cases are not so surprising 😞
To solve that issue, we can use following. That is not clean at all. But as FBRef does not provide round information for MLS, we may skip fully or keep info about it is regular season or play-off.