serve-and-volley / atp-world-tour-tennis-data

Using Python to scrape ATP World Tour tennis data
193 stars 109 forks source link

Match Stats Winner & Loser Issues #17

Closed bezchristo closed 4 years ago

bezchristo commented 5 years ago

Hey man great work on the python scripts!

I have picked up an issue with the match stats though. The "scrape_match_stats" function in functions.py makes the assumption that the winner is always left. This is not always the case though.

Here is an example: stats

To get around this you can check which side has the "won-game" class which produces the checkmark next to their name. Here is the xpath for finding the class. //table[@class='scores-table']/tbody/tr[1]/td[1]/@class

projectdownton commented 5 years ago

Hi and thanks a lot for this awesome scripts!!

@bezchristo I am facing this very same problem, but I'm not familiar at all with scraping, and I'm afraid that I could mess all up if I try fix it... would you be so kind to please post the corrected code?

Huge thanks in advance!!!

projectdownton commented 5 years ago

Hi, In the end I think I got a solution. In function scrape_match_stats, get the following two variables: won_game_left=xpath_parse(match_tree, "//table[@class='scores-table']/tbody/tr[1]/td[1]/@class")[0] won_game_right=xpath_parse(match_tree, "//table[@class='scores-table']/tbody/tr[2]/td[1]/@class")[0] Then use them to select the right winner and loser according to the position. if won_game_left=='won-game': try: winner_slug_xpath = "//div[@class='player-left-name']/a/@href" winner_slug_parsed = xpath_parse(match_tree, winner_slug_xpath) winner_slug = winner_slug_parsed[0].split('/')[4] except Exception: winner_slug='' try: loser_slug_xpath = "//div[@class='player-right-name']/a/@href" loser_slug_parsed = xpath_parse(match_tree, loser_slug_xpath) loser_slug = loser_slug_parsed[0].split('/')[4] except Exception: loser_slug = '' elif won_game_right=='won-game': try: loser_slug_xpath = "//div[@class='player-left-name']/a/@href" loser_slug_parsed = xpath_parse(match_tree, loser_slug_xpath) loser_slug = loser_slug_parsed[0].split('/')[4] except Exception: loser_slug='' try: winner_slug_xpath = "//div[@class='player-right-name']/a/@href" winner_slug_parsed = xpath_parse(match_tree, winner_slug_xpath) winner_slug = winner_slug_parsed[0].split('/')[4] except Exception: winner_slug = '' else: print('Error 45069')

theyaw commented 4 years ago

@bezchristo hey man I need a favor. I am currently working on an atp project and I need 2018 and 2019 data but I do not have the expertise to scrape the data from the atp tour website. Is that something you can help me with if it's not too much of a big lift?

rcorty commented 4 years ago

I believe all the match data is up for those years. What are you asking for?

serve-and-volley commented 4 years ago

@bezchristo: Hi Christo, I have revised all the python scripts and rescraped all the CSV files through the 2019 matches. In addition I updated them for Python 3. I've addressed the "left" and "right" issue by the following lines:

You can close this issue if you have no other question?