While using the soccerdata library to scrape event data from WhoScored, I've encountered an issue where the read_events function seems to ignore the live=False parameter. Despite explicitly setting live=False, the function attempts to scrape the live URL, resulting in repeated errors. (Please ignore the "priority game" aspects of the script that is carry over from another project that I did not remove from this function call.)
Here are some relevant details:
Script Parameters and Logs:
The script sets live=False for the read_events function.
However, the logs indicate that the function tries to access the live URL: https://www.whoscored.com/Matches/1787316/Live.
INFO Setting read_events params: match_id=1729479, output_fmt=spadl, force_cache=False, live=False scrape_euros.py:133
INFO Could not find priority game 1729479. scrape_euros.py:151
INFO Processing home team: Scotland [424] scrape_euros.py:160
INFO Processing away team: Hungary [327] scrape_euros.py:161
INFO Processing game 1787316... scrape_euros.py:168
ERROR Error while scraping https://www.whoscored.com/Matches/1787316/Live. Retrying in 0 seconds... (attempt 1 of 5). _common.py:469
HTML Structure for Group Stage vs Knockouts:
Another observation that might be relevant is the difference in HTML structure when accessing group stage games versus knockout stage games. This difference could potentially affect the scraping process.
Steps to Reproduce:
Set up a script to scrape event data using the soccerdata library.
Ensure the read_events function has live=False.
Run the script and observe the logs.
Expected Behavior:
The read_events function should not attempt to access the live URL when live=False is set.
Actual Behavior:
The function tries to scrape the live URL, leading to repeated errors.
# Relevant snippet showing the function call
ws.read_events(
match_id=game_id,
output_fmt=output_fmt,
force_cache=force_cache,
live=live
)
Additional Context:
The HTML structure for group stage games versus knockout stage games might be contributing to the issue. The difference in structure could potentially impact the scraping process.
Environment:
soccerdata version: [please specify]
Python version: 3.10.4
Operating System: macOS
Potential Fix:
Please investigate why the live=False parameter is not being respected by the read_events function. Additionally, consider any differences in HTML structure between group stage and knockout stage games that might affect scraping.
Thank you for your attention to this issue. Let me know if you need any additional information.
While using the
soccerdata
library to scrape event data from WhoScored, I've encountered an issue where theread_events
function seems to ignore thelive=False
parameter. Despite explicitly settinglive=False
, the function attempts to scrape the live URL, resulting in repeated errors. (Please ignore the "priority game" aspects of the script that is carry over from another project that I did not remove from this function call.)Here are some relevant details:
Script Parameters and Logs:
live=False
for theread_events
function.https://www.whoscored.com/Matches/1787316/Live
.HTML Structure for Group Stage vs Knockouts:
Steps to Reproduce:
soccerdata
library.read_events
function haslive=False
.Expected Behavior: The
read_events
function should not attempt to access the live URL whenlive=False
is set.Actual Behavior: The function tries to scrape the live URL, leading to repeated errors.
Logs:
Code:
Additional Context: The HTML structure for group stage games versus knockout stage games might be contributing to the issue. The difference in structure could potentially impact the scraping process.
Environment:
soccerdata
version: [please specify]Potential Fix: Please investigate why the
live=False
parameter is not being respected by theread_events
function. Additionally, consider any differences in HTML structure between group stage and knockout stage games that might affect scraping.Thank you for your attention to this issue. Let me know if you need any additional information.