Closed patem2 closed 2 years ago
Same here:
python racecards.py today
no race_type found for course scoop 6
Traceback (most recent call last):
File "racecards.py", line 422, in
After running the scraper and returning the above error, looks like the race card url returns a blank white screen. I think the site just blocks it straight away.
As a temporary fix, after replacing the references to asyncio.run(get_documents(race_urls)) with the following in racecards.py:
race_docs = []
for race_url in race_urls:
resp = session.get(race_url)
race_docs.append((race_url, html.fromstring(resp.text)))
I also had to bypass the reference to each runners form (thus reducing the number of calls being made to the server) which leaves out some fields from the race card I don't need and it now works, must be a throttling issue with racing post. Hope this helps someone else.
The requests are now severely limited, i havent tested to find the exact number of allowed requests but once you hit the limit, its 403 responses for hours, maybe even up to 12 hours.
Initially the requests were limited per minute, which was fine to just wait, the limit was then completely removed and the project changed to async requests to take advantage of that, seeing a massive speed increase.
Against my better judgement I made public the racecard scraping script and that appears to be have been heavily abused for a long time and now the whole project is crippled by the new limits.
Looks like AtTheRaces is now the last Bastian of freely obtainable horse racing data. For what it's worth I doubt the use of racecards.py has really influenced things that much at rp. They display live odds on that url and it's natural that many people will try and scrape it. If CloudFlare security is behind this you might struggle to spoof them for any significant period of time either. You can't scrape OddsChecker anymore for the same reason and that started about 2 months ago as well. I have been quoted £12k a year for Timeform's database and race card API. Its the only addition to Betfair's own API for race card info and legitimate route I can see available for UK/IRE racing. seems pretty extortionate swell for what it is.
Yeah I had a library up for scraping Oddschecker but as you say they started using cloudfare recently so I just took it down, its never gained much traction any way but I was planning to use it eventually for my main project. I started working on a new odds scraping library but its private right now, havent quite finished it and focus has been elsewhere recently. That is ridiculous pricing for the data and API and the main reason why I started this project.
Guys if you’re using this within acceptable limits e.g once a day scraping the race card and prior days data (like I do), I’ve found just recycling your VPN eliminates this error (if you have one that is).
I use Norton antivirus which comes with one bundled in, I previously had Rapid VPN but canned it when I clocked Norton comes bundled with with one anyhow.
betfairlightweight and flumine are 2 repos worth investigating:
betfairlightweight - python wrapper for Betfair API-NG (with streaming) flūmine - Betfair trading framework
In regards to staying with RP & VPN issues, I'd consider trying pythonanywhere, hosted in Frankfurt. They offer a vm for 5 Euro per month, that includes scheduled tasks for i.e. web scraping.
Currently, I'm also trying out ScraperBox.com, which claim to offer an, "Undetectable Web Scraping API":
Rotating proxies that never get blocked Render Javascript with real chrome browsers You won't get stopped by robot-check captchas
Closing this as any discussion on this topic should be in the Access Denied issue.
Hi,
When trying to run the racecards.py file it presents the following error:
python racecards.py today Traceback (most recent call last): File "racecards.py", line 430, in
main()
File "racecards.py", line 416, in main
race_urls = get_race_urls(session, racecard_url)
File "racecards.py", line 93, in get_race_urls
doc = html.fromstring(r.content)
File "C:\Users\markp\anaconda3\lib\site-packages\lxml\html__init__.py", line 875, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
File "C:\Users\markp\anaconda3\lib\site-packages\lxml\html__init__.py", line 763, in document_fromstring
raise etree.ParserError(
lxml.etree.ParserError: Document is empty
Best wishes Mark