Closed shufinskiy closed 8 months ago
The problem only exists for the WhoScored scraper but your solution affects all scrapers. For some scrapers, an empty reply might actually be a valid answer. For example, if a new team is promoted, an empty result is expected in the ClubElo scraper.
Moreover, the bash script checks the file size simply because that was easy to write as a bash command. What it really should check is whether the file contains an empty JSON object. Something that could easily be done in Python.
Yes, you're right. I'll think about how it can be implemented in a different way.
@probberechts I fixed the verification logic: now inside the Whoscored.read_events method there is a check of the first 4 bytes of the file: if they are null, then the get method is run again with the no_cache=True parameter.
reader = self.get(
url,
filepath,
var="requirejs.s.contexts._.config.config.params.args.matchCentreData",
no_cache=live,
)
if reader.read(4) == b'null':
reader = self.get(
url,
filepath,
var="requirejs.s.contexts._.config.config.params.args.matchCentreData",
no_cache=True,
)
reader.seek(0)
json_data = json.load(reader)
Nice solution! Thanks.
Hello, @probberechts.
I propose a solution to the problem of empty files in the cache for Whoscored.
In issue 98 you suggest delete empty file with bash command by file size.
I made method
_size_file
which does same withPath.stat().st_size
. If the file is smaller than threshold, we believe that it is not cached