orangecoding / fredy

:heart: Fredy - [F]ind [R]eal [E]states [D]amn Eas[y] - Fredy will constantly search for new listings on sites like Immoscout or Immowelt and send new results to you, so that you can focus on more important things in life ;)
http://www.orange-coding.net
MIT License
212 stars 54 forks source link

Immobilienscout is blocking ScrapingAnt now #31

Closed tsteffek closed 2 years ago

tsteffek commented 2 years ago

I've realized that fredy hasn't been posting any Immobilienscout postings as of lately. Apparently they upped their robots detection game during June: when trying to do a request call through the ScrapingAnt dashboard, the result notes that the request was identified as a robot and therefore blocked.

orangecoding commented 2 years ago

Yeah, I've seen it as well. Tbh, it's a little cat an mice game.

Im currently testing changing the scraping to puppeteer, using a plugin to solve the capture, however that's also just a matter of time until the close this one as well.

Not sure what to do about this yet.

tsteffek commented 2 years ago

I've now tried their search alarm. They can send you an e-mail every hour or, even better, use browser notification. Which is fine for somebody who works a lot at the computer like me. I'd still prefer the telegram notifications, but as your saying, it'd probably be easier to get them to create a telegram bot than to always keep up with their security.

...on that note, random idea, is there maybe a way to capture the browser notification and send it to telegram?

orangecoding commented 2 years ago

it probably would be with a browser extension, however this would mean to keep that tab open at all times.

kami4ka commented 2 years ago

Hey, guys. I'm Oleg from ScrapingAnt. As we've discovered, Immobilienscout triggers mostly to the standard (datacenter) proxies, while residential works fine. We're currently working on the residential proxies integration, so it may improve the detection system avoidance.

orangecoding commented 2 years ago

That's awesome to hear @kami4ka. If you need any help please ping me directly. I'll keep this issue open so that you can update all of us once News are available, I'm more than happy toads necessary changes into Fredy.

orangecoding commented 2 years ago

@kami4ka any news on this?

kami4ka commented 2 years ago

@orangecoding We're finishing the implementation and testing now, so it should be live soon

kami4ka commented 2 years ago

@orangecoding we've done with the testing and implementation.

What we've improved: our browser became better with detection avoidance and residential proxies have been added. Using residential proxies is kind of a costy solution (we've trialed about 10 vendors in this market and picked the most reliable and fast one), but, some of the sites are not scrapable without them.

Information about residential proxies can be found here: https://docs.scrapingant.com/proxy-settings Also, I'd suggest trying Immobilienscout without residential first, as the success rate with datacenter proxies has been improved and use the whole range of IP addresses (not only EU) (I've tried it today using the following URL: https://www.immobilienscout24.de/Suche/radius/wohnung-mieten?centerofsearchaddress=M%C3%BCnchen;;;1276002059;Bayern;&numberofrooms=3.0-4.0&geocoordinates=48.15437;11.54199;10.0&sorting=2)

orangecoding commented 2 years ago

I've checked a few of my running instances and all of them running well again with standard datacenter proxies, so afaik no need to change anything code wise. (I assume the proxy_typeparameter is optional and only needed when a residental proxy is needed @kami4ka )

@tsteffek Hope you're stilling following this, everything should be working again thanks to the awesome gents at https://scrapingant.com/