joenano / rpscrape

Scrape horse racing results data and racecards.
140 stars 57 forks source link

Failure when pulling jump data for last 10 years of Cheltenham #99

Closed Swebbee closed 2 years ago

Swebbee commented 2 years ago

Morning,

If I put in [rpscrape]> 11 2010-2020 jumps

I get Traceback (most recent call last): File "C:\Users\Swebby\rpscrape\scripts\rpscrape.py", line 149, in main() File "C:\Users\Swebby\rpscrape\scripts\rpscrape.py", line 143, in main races = get_race_urls(args['tracks'], args['years'], args['type']) File "C:\Users\Swebby\rpscrape\scripts\rpscrape.py", line 52, in get_race_urls results = loads(race[1])['data']['principleRaceResults'] orjson.JSONDecodeError: EOF while parsing a value at line 1 column 0: line 1 column 1 (char 0)

Any idea why this might be?

Kind Regards Sean

Swebbee commented 2 years ago

Tried it again just now with a slightly different result. rpscrape]> 11 2010-2020 jumps Traceback (most recent call last): File "C:\Users\Swebby\rpscrape\scripts\rpscrape.py", line 149, in main() File "C:\Users\Swebby\rpscrape\scripts\rpscrape.py", line 145, in main scrape_races(races, args['folder_name'], args['file_name'], args['type']) File "C:\Users\Swebby\rpscrape\scripts\rpscrape.py", line 93, in scrape_races race = Race(doc, code, settings.fields) File "C:\Users\Swebby\rpscrape\scripts\utils\race.py", line 32, in init self.race_info['course'] = self.get_course(url_split[5]) File "C:\Users\Swebby\rpscrape\scripts\utils\race.py", line 297, in get_course course = self.doc.xpath("//a[contains(@class, 'rp-raceTimeCourseName__name')]/text()")[0].strip() AttributeError: 'NoneType' object has no attribute 'xpath'

gbettle commented 2 years ago

Lately, RP have tightened up their servers and will throttle \ ban your IP if it thinks you are having a laugh. Clear your pc, restart it and then your router, use a vpn, etc. etc.

PM me, I might be able to find 2010-2020 for you.

joenano commented 2 years ago

Pretty much every error right now is going to be 403 responses. I am going to try changing back to synchronous requests as its possible they might just be blocking rapid fire.