orangecoding / fredy

:heart: Fredy - [F]ind [R]eal [E]states [D]amn Eas[y] - Fredy will constantly search for new listings on sites like Immoscout or Immowelt and send new results to you, so that you can focus on more important things in life ;)
http://www.orange-coding.net
MIT License
212 stars 54 forks source link

Use residential proxies for Immoscout #32

Closed kami4ka closed 2 years ago

kami4ka commented 2 years ago

I'd suggest an enhancement for Immoscout scraping. As for my observation, the standard ScrapingAnt proxies are unstable in the scope of the detection.

My suggestion is the following: 1) Try using standard proxy 2) If detected - retry with residential

Also, as an alternative, retry using standard proxies can be added before using residential: 1) Try using standard proxy 2) If detected - retry using standard N times 3) if still detected - retry with residential

Residential request costs more but looks like it is cheaper than retry with standard proxies.

orangecoding commented 2 years ago

That's a good idea. I have to add some failsafes nonetheless. However using residential Proxies eat up the credits for free users in about 4 calls, something Theisen must be aware of

kami4ka commented 2 years ago

@orangecoding Oh. Not really for 4 calls :-) 40 calls for a free plan, but yep, you're right, it's a costly solution.

orangecoding commented 2 years ago

@kami4ka is there any hint in json I'm getting back from scrapingant that we have hit a capture?

kami4ka commented 2 years ago

As we can observe - yep. 1) Currently blocked HTML content contains a following title tag: <title>Ich bin kein Roboter - ImmobilienScout24</title> 2) Headline block contains the following text: Ich bin kein Roboter

I guess that one of those can be used for the simple detection of the capture. Here is an example of the detected page: https://gist.github.com/kami4ka/efd1ed05c940c1eb549e172ca1b557fd