Closed skobuv1 closed 4 years ago
Hi, i'm glad you found my solution interesting. Here is some code to retrieve the data you mentioned (tournament names, match times and players)
Please note that i use chrome driver and not firefox one. But the solution should work the same.
In practice you just have to find the div you are looking for, then you travel the DOM with "find_element/elements" by tagname or by classname to access the information you need. I hope code comments can explain better how it works.
import json
import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
def wait_for_ajax(driver):
wait = WebDriverWait(driver, 15)
try:
wait.until(lambda driver: driver.execute_script('return
jQuery.active') == 0)
wait.until(lambda driver: driver.execute_script('return
document.readyState') == 'complete')
except Exception as e:
pass
browser = webdriver.Chrome()
browser.maximize_window()
url = "https://www.sofascore.com/tennis"
start_time = time.time()
browser.get(url)
wait_for_ajax(browser)
allTournament =
browser.find_elements_by_xpath("//*[@id='pjax-container-main']/div/div[2]/div/div[2]/div[2]/div")
allTournament =
allTournament[0].find_elements_by_class_name("js-event-list-tournament")
data = {
"tournament" : []
}
#LOOP THE TOURNAMENTS
for t in allTournament:
#TOURNAMENT
tournament_name = t.find_element_by_class_name("tournament__name")
tournament_category =
t.find_element_by_class_name("tournament__category")
tournament = {
"match" : [],
"tournament_name" : tournament_name.text,
"tournament_category" : tournament_category.text
}
match_of_tournament =
t.find_element_by_class_name("js-event-list-tournament-events").find_elements_by_tag_name("a")
#LOOP THE MATCHES IN THE TOURNAMENT
for m in match_of_tournament:
##PLAYERS
player = m.find_elements_by_class_name("event-team")
##TIME
time = m.find_element_by_class_name("u-w48")
match = {
"time" : time.text,
"player" : []
}
for p in player:
match["player"].append(p.text)
tournament["match"].append(match)
data["tournament"].append(tournament)
##DUMP JSON INTO A FILE
json_data = json.dumps(data)
f = open("tennis.json", "w+")
f.write(json_data)
f.close()
Man, thanks a lot, works like a charm :) sorry for the trouble, I just came to Python this week for one side project :) Grazie Mille
Hey Giacomo, I like your script, it works perfectly for football. I was wandering if you could help me a bit to edit the script for tennis. I just need to get all of the matches for today on https://www.sofascore.com/tennis -(name of the players, and time, maybe also the name of the tournament). And export to JSON or Excel
But I can't figure out the how to loop through the matches :/ Would be very thankful for yout help.