msarnacki / flashscore-scraper

Scraping football matches details for a whole, given season. First web scraping script. Made to learn and for fun.
5 stars 3 forks source link

H2H stats Scheduled Games #1

Open xChr11s opened 3 years ago

xChr11s commented 3 years ago

Hey πŸ‘ Your scraper looks very interesting so far. I tested a bit around but couldn't achieve the results I wanted. So my goal was to get the data from Upcoming Matches. It has to colelct the last 15 Matches every Team has played and the last Matches against each other. This data is found under the tab H2H. The programm has to click four times on the "Show more matches" tab and scrape the Teams and results.

I acheived to click on "Schedueld" with the following code:

#check upcoming
button_tomorrow = driver.find_element_by_xpath('/html/body/div[4]/div[1]/div/div[1]/div[2]/div[4]/div[2]/div[1]/div[1]/div[6]/div')
button_tomorrow.click()
time.sleep(5)

Would this be possible to code?

Thanks in advance,

Kind regards Chris

msarnacki commented 3 years ago

Hello! :+1:

Thank you very much for your interest in my project.

I added to the repo a new little script (h2h.py) that scrapes h2h matches info like you wanted. The script can scrape as many last matches as you want.

def get_matches_info(matches, how_many):
    for i, match in enumerate(matches):  
        if i == how_many:
            break
        #get teams names and score and print it
        teams = match.find_all(class_ = 'name')
        team1 = teams[0].text
        team2 = teams[1].text
        score = match.find(class_ = 'score').text
        print(str(i) + '. ' + team1 + ' ' + score + ' ' + team2)

Function takes 2 arguments:

And here is the part of code that clicks "Show more matches" four times. Two times for home and two times for away table. It is not necessary in the script because all matches info are in the source code even if they are hidden but yeah, this is surely possible. :relaxed:

for i in range(2):
    #gets list of elements with arrows (arrows are always with "Show more matches")
    show_more = driver.find_elements_by_class_name('arrow')
    #click first "more" and wait a second
    show_more[0].click()
    time.sleep(1)
    #click  second "more" and wait a second
    show_more[1].click()
    time.sleep(1)

To every key part of code I attached my comments. If you have more questions, please feel free to ask me. :smiley:

If you have some more ideas how to do this you can fork my repo and do a pull request. :smiley:

Kind regards Maciej

xChr11s commented 3 years ago

Hey, wow thanks for your fast answer and Code !

I didn't know that the matches are stil in the source code even if "Show more matches" is not clicked. It threw an error that the Bet365 Ad was in front, so I just deleted the Click 4 Times and it worked fine :)

selenium.common.exceptions.ElementClickInterceptedException: Message: element click intercepted: Element <span class="arrow"></span> is not clickable at point (507, 881). Other element would receive the click: <a class="boxOverContent__bannerLink" href="/promobox/11313/?sport=1" data-mobile-url="/promobox/11313/?sport=1&amp;mobile=1" target="_blank"></a>
  (Session info: chrome=85.0.4183.83)

I got the following data from your match: Home team last matches

  1. West Brom 2 : 2 QPR
  2. Huddersfield 2 : 1 West Brom
  3. West Brom 0 : 0 Fulham
  4. Blackburn 1 : 1 West Brom
  5. West Brom 2 : 0 Derby
  6. West Brom 4 : 2 Hull
  7. Sheffield Wed 0 : 3 West Brom
  8. Brentford 1 : 0 West Brom
  9. West Brom 0 : 0 Birmingham
  10. Swansea 0 : 0 West Brom
  11. West Brom 2 : 3 Newcastle
  12. West Brom 0 : 1 Wigan
  13. West Brom 2 : 0 Preston
  14. Bristol City 0 : 3 West Brom
  15. West Brom 2 : 2 Nottingham Away team last matches
  16. Leicester 0 : 0 Sheffield Wed
  17. Birmingham 0 : 2 Leicester
  18. Leicester 0 : 2 Manchester Utd
  19. Tottenham 3 : 0 Leicester
  20. Leicester 2 : 0 Sheffield Utd
  21. Bournemouth 4 : 1 Leicester
  22. Arsenal 1 : 1 Leicester
  23. Leicester 3 : 0 Crystal Palace
  24. Everton 2 : 1 Leicester
  25. Leicester 0 : 1 Chelsea
  26. Leicester 0 : 0 Brighton
  27. Watford 1 : 1 Leicester
  28. Leicester 4 : 0 Aston Villa
  29. Leicester 1 : 0 Birmingham
  30. Norwich 1 : 0 Leicester VS each other last matches
  31. West Brom 1 : 4 Leicester
  32. Leicester 1 : 1 West Brom
  33. West Brom 1 : 2(1 : 1) Leicester
  34. West Brom 0 : 1 Leicester
  35. Leicester 1 : 2 West Brom

I will try to export these into an excel file. If I found a way I can do a fork :) Maybe I need some help there but I will try alone first :)

Thanks !

msarnacki commented 3 years ago

Hey,

yeah, sometimes ads or cookie notifications get in front of buttons. It depends on window size, sometimes it is good to maximize window or scroll page to the element you want to click.

With matches being in code even when they are hidden, I think it is not a common thing. In results from a particular season, for example here matches that are hidden are not in the source code and first you need to click "show more".

:relaxed:

debugleader commented 3 years ago

Hey, awesome project btw @msarnacki Also, @xChr11s, do you wanna start a new project with python? If you need my help, I would be glad to contribute and try to improve it :)

xChr11s commented 3 years ago

Hey,

yeah, sometimes ads or cookie notifications get in front of buttons. It depends on window size, sometimes it is good to maximize window or scroll page to the element you want to click.

With matches being in code even when they are hidden, I think it is not a common thing. In results from a particular season, for example here matches that are hidden are not in the source code and first you need to click "show more".

☺️

Hey, yes on the h2h page it is good that you dont need to click on show more matches. I stil couldn't figure out how to save the match data in an excel file, my python knowledge isn't good enough I guess. I changed the Code a bit so that it is looking like this: Gyazo

And my goal is to get the data saved like this: Gyazo

But I'm only getting errors x) Do you have an Idea how to sve it like this? I will try it further but don't think I can get it working.

Hey, awesome project btw @msarnacki Also, @xChr11s, do you wanna start a new project with python? If you need my help, I would be glad to contribute and try to improve it :)

I dont think that my Python skills are good enough x) But Thanks for your help :)

debugleader commented 3 years ago

Hey it's fine, we always start somewhere :) Tell me if you're down to learn more @xChr11s

xChr11s commented 3 years ago

I guess I'm not good enought to get this done, tried for several hours now and I'm completely done now ... I just want the output from the scrape in an simple excel sheet ... this can't be that hard ..