Closed kami4ka closed 2 years ago
That's a good idea. I have to add some failsafes nonetheless. However using residential Proxies eat up the credits for free users in about 4 calls, something Theisen must be aware of
@orangecoding Oh. Not really for 4 calls :-) 40 calls for a free plan, but yep, you're right, it's a costly solution.
@kami4ka is there any hint in json I'm getting back from scrapingant that we have hit a capture?
As we can observe - yep.
1) Currently blocked HTML content contains a following title
tag: <title>Ich bin kein Roboter - ImmobilienScout24</title>
2) Headline block contains the following text: Ich bin kein Roboter
I guess that one of those can be used for the simple detection of the capture. Here is an example of the detected page: https://gist.github.com/kami4ka/efd1ed05c940c1eb549e172ca1b557fd
I'd suggest an enhancement for Immoscout scraping. As for my observation, the standard ScrapingAnt proxies are unstable in the scope of the detection.
My suggestion is the following: 1) Try using standard proxy 2) If detected - retry with residential
Also, as an alternative, retry using standard proxies can be added before using residential: 1) Try using standard proxy 2) If detected - retry using standard N times 3) if still detected - retry with residential
Residential request costs more but looks like it is cheaper than retry with standard proxies.