Closed denisalevi closed 9 months ago
Unfortunately, it fails always. We're acknowledged of this situation and currently trying to fight it out. We've already changed the technology behind the service and improved the detection rate for various different websites while we were working on this issue, but not immoscout yet. Still, we're still in progress and would notify all the users (who tried to make a request to immoscout) via email.
It's honestly a fight against windmills.
I know there are hundreds of Fredy user out there coz I keep getting emails about ppl asking me to fix the immoscout scraper...
Thanks for the information @kami4ka
And yeah, I can imagine @orangecoding. Fredy is a real game changer, especially in Berlin, where every second counts (it saved my ass a couple of months ago). And I can imagine that Immoscout is constantly changing. But I have to say, considering that, Fredy has been running quite smoothly for the last months, thanks for that! I have it set up for a few friends, who share the scraping ant fee (currently stopped until it is working again). I just saw your sponsoring option. I'll make sure to include you in the shared costs once we are up again :)
If there is anything I can contribute, please let me know. It's just not my expertise at all unfortunately.
The latest update is that we've found a way to fix it and bypass it. We're going to test and prepare everything for the cluster deployment (some stuff is still unclear at that part) and reach anyone who made Immoscout calls over the last two months via email.
Awesome.
Can you share with us how many user we are taking about? @kami4ka
Hey! First of all thanks for the amazing project :) I was trying around how to evade the immoscout restrictions and tested these approaches:
scrapingAnt
: not working (maybe they can fix that somehow?)puppeteer
: not workingpuppeteer-extra-plugin-stealth
: not workingpython with selenium
: not workingpython with selenium and undetected_chromedriver
: works! but only when I really render the browser, headless option is detected :/ also after repeated calls they were somehow able to block me, but the next day it worked againMaybe the info helps, but having to render the browser is a bit of a bummer for easy deployment. And this undetected_chromedriver
library only is in python and does some fancy stuff I do not completely understand.
Hi phil,
Thanks. As I said earlier this is a cat and mice game.
We might be able to overcome this by using unprotected api endpoints. However this too might be something that only works for a limited amount of time..
Hey @orangecoding,
agreed it is a very nasty cat and mice game with the other side having probably a lot more developers than we have here working on this project. But I mean if we somehow manage to use a chrome based browser using a package like undetected_chromedriver
with rendering the screen it will be very difficult to detect that without blocking "legitimate" users out of immoscout. The only problem with that I still haven't found a way to run that in docker. Unprotected API endpoints will get fixed for sure at some point and I guess immoscout is probably even monitoring repos like this one here ;)
Hi @phil-bergmann,
can you provide your approach with undetected_chromedriver
? Would be nice to give it a try. I've also seen your approach with ScrapingBee, but would like to avoid the payed account.
Immoscout is working for me every now and then. @kami4ka Do you have an update for us?
Stumbled upon this project today and was asking myself the same thing. I really hope this gets fixed. @kami4ka I would subscribe right away!
For some reason, @kami4ka is currently unavailable. I hope he's doing ok as he's from the ukraine... In the meantime, I see that nearly all my tests were successful after a couple of retries.
Can you guys confirm?
Sorry for the delay.
We've a bit stuck with moving our PoC for this detection to the production environment, so it's getting delayed. We're doing our best, as it would allow us to cover more protections like this, so it's our top priority.
I'll keep you updated once we'll figure it out totally.
hey @kami4ka, just wondering if there is an update available for this? I notice immoscout is not able to be used; it never finds any listings. Thank you!
ScrapingBee (not ScrapingAnt) and Zyte API are able to scrape Immoscout.
A request on ScrapingBee with a "stealth proxy" costs approx. $0.04 while Zyte API costs $0.008
Yeah I am also considering providing different solutions.. not sure however whether to replace scrapingant or just add scrapingbee
@ilindaniel By the way, I was trying to use ScrapingBee to scrape Immoscout (used it on their website) but hit the bot detection every time. Are you totally sure, scrapingBee found a way around it? I honestly don't want to implement various services just to see that they too don't work
Have you checked the "stealth proxy" checkbox?
Nevertheless I'd suggest to have a look at Zyte since they are 5x cheaper than ScrapingBee
Hey guys. I'd suggest you trying out ScrapeOps: https://scrapeops.io/proxy-aggregator/ They are aggregating web scraping providers and it could be the best way for such cases.
Each provider could have similar tech, but still different (for example, of how a browser executes in the cluster), so it would allow not to tight with some particular one, but aggregate all of them.
You can check more at landing page.
@kami4ka I tried them (as well as a bunch of others) however I always hit the wall.
{"status":"Failed to get successful response from website. Please retry the request."}
To be quite honest with you I am sick and tired of this cats and mice game and currently thinking about totally removing the support for immoscout.
@orangecoding Yeah, I totally understand you We always suggest finding an alternative data source when the cost of the specific data-source extraction becomes a problem, including the detection avoidance creation cost. Unfortunately, it looks like it is a case with Immoscout too.
As of now, immoscout still doesn't work right? Or am I missing something in my setup? Cheers and thank you!
No and it doesn't seem like @kami4ka is having much trust in fixing this.
I was recently playing around with ai to overcome the capture but there is actually a legal issue.
See scraping is ok-ish until you do not harm the website OR you are not trying to defeat things that have been put in place in order to block scraping. Like captures.
And tbh, I don't want to mess with them.. ;)
Zyte is still able to scrape ImmoScout:
However I'm quite lazy and use ImmoScout's email notification service at the moment. Might not be as instant as scraping it, but that's the quick fix for now.
It appears that scraping immoscout stopped working reliably. This is not actually a bug, but maybe changes at Immoscout? Either way, all my Immoscout Providers keep failing, both, with datacenter and residential proxies. I don't think I've seen a successful Immoscout scrape in days (but I also was on my own patched fork before, now I updated to current master and it's the same, all retries seem to be failing).
Can someone check or reproduce this? Or might this be some problem on my side? Maybe @kami4ka can shed some light from the scraping ant side? :)