joenano / rpscrape

Scrape horse racing results data and racecards.
140 stars 57 forks source link

Access to www.racingpost.com was denied #87

Closed qemtek closed 2 years ago

qemtek commented 2 years ago

Hi, maybe you know how I can get around this.. Ive been running a script that pulls the data from racing post by calling the rpscrape.py script. Im using a scheduler to run the script for each day/country and I think I made too many calls? Is there a way to get around this? Or is my IP permanently banned

joenano commented 2 years ago

Ive no idea, first I have heard of this happening but it does sound like your IP might be banned. Can you access the site through the browser? If you have been banned I imagine you will need to use a VPN/proxy.

qemtek commented 2 years ago

A note for anyone else, resetting your router did the trick for me!

joenano commented 2 years ago

Its possible that request limiting is back as ive just had this 403 issue as well. Maybe have to go back to synchronous requests.

RobbieJamesBennett commented 2 years ago

Hi,

I have had the same problem, on the 15th December there were two server instances running scheduled race card updates (every 10 minutes) and it seems access has been denied. Any thoughts on how to get around this? It was happily chugging away for 4 months before this happened

joenano commented 2 years ago

Spamming them every 10 mins for 4 months seems a bit excessive.

RobbieJamesBennett commented 2 years ago

Yeah, but that's also about the minimum frequency you would want to trade using race card information, any less and you are going to be trying to place bets on scratched runners and jockeys. Could either be the polling frequency or the nature of the request (asyncio) or both that's being flagged I don't know

gbettle commented 2 years ago

Hi,

I have had the same problem, on the 15th December there were two server instances running scheduled race card updates (every 10 minutes) and it seems access has been denied. Any thoughts on how to get around this? It was happily chugging away for 4 months before this happened

This repo is no way to blame for that frequency of scraping. Any website's sysadmin - worth their weight - will blacklist\block that amount of heavy traffic from an ip asap.

So look into another repo, betfairlightweight or flumine; 2 repos that can interface with Betfair's streaming service, an api that streams Betfair markets.

minGRID992020 commented 2 years ago

It's a great scraper! Thank you for sharing. I too have an issue with getting IP blocked. I gather they use Swiftbot and it blocks the IP. I really want to get the UK and IRE historical data, but it looks like I'll have to do 1 month at a time rotating IP addresses. Is there an optimum amount of months to scrape? E.g 2 months then rotate to next ip address? I'm looking at replicating a strategy from about 15 years ago and wanted to do some research using the data from your excellent scraper! Once again. Thanks!

joenano commented 2 years ago

I have not tested to see what the limits are but may try reverting to synchronous requests, id rather it was slow than fast and blocked.

minGRID992020 commented 2 years ago

Yes, that makes sense. I tried to scrape one month at a time, but I'm not convinced it is picking up every single race. I also got an error yesterday saying it couldn't find the number of runners at a certain race. Perhaps that was the IP blocked? But for one day results, it's absolutely awesome. Great work.

T00NJEDI commented 2 years ago

hi any luck with slowing down the scrape? software is great but i now get kicked off immediately. its a shame because this a wonderful programme.

joenano commented 2 years ago

I have reverted to synchronous requests, hopefully this addresses the temporary IP blocking but its not guaranteed, I have not tested it.

T00NJEDI commented 2 years ago

Many Thanks. RP bots already recognise my routers IP address. I hook up to my mobile phone for internet. Now working brilliantly. Scraped 4 months tonight. Thankyou!!

minGRID992020 commented 2 years ago

Please forgive my lack of tech knowledge, but when it asks me if I want to update and I enter Y, it always says "failed to update" How do I use the synchronous requests if it doesn't update?

joenano commented 2 years ago

Whenever I test update it works for me but for some reason it doesn't work for others, there are a bunch of people who just clone again when there is an update and never open an issue about the update not working so it's hard for me to solve.

You can delete the rpscrape folder and just do git clone again which will download the latest version.

minGRID992020 commented 2 years ago

Awesome. Will do that and test. Thanks for the very quick response and for your excellent work again! It really is genuinely appreciated.