nickirk / immo

A bot which monitors immoscout24 and wg-gesucht.de for new flat offers and send requests to offers automatically.
GNU General Public License v3.0
146 stars 44 forks source link

Immoscout crawler doesn't work #10

Closed jann-klaas closed 10 months ago

jann-klaas commented 4 years ago

Hi there,

I won't get the immoscout bot to work. WG-Gesucht works fine.

Here's the error depending if I use python3 or 2. I assume the crawler is broken as the href file never get's any data. Any ideas how to fix it?

Janns-MBP:immobot jann$ python immo.py Traceback (most recent call last): File "immo.py", line 8, in from json import JSONDecodeError ImportError: cannot import name JSONDecodeError

Janns-MBP:immobot jann$ python3 immo.py There was a problem with reading a json formatted object Traceback (most recent call last): File "immo.py", line 17, in data = json.load(data_file) File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/init.py", line 293, in load return loads(fp.read(), File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/init.py", line 357, in loads return _default_decoder.decode(s) File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/local/Cellar/python@3.8/3.8.4/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Time: 2020-08-17 22:43:03.966169 ^CTraceback (most recent call last): File "immo.py", line 74, in time.sleep(60)

nickirk commented 4 years ago

With python2, one should

from simplejson import  JSONDecodeError

Let's focus on python3.

Could you paste the content of the file href.json here?

rodrigodealer commented 4 years ago

Hi @nickirk I got the same error.

The content of href.json is empty:

cat href.json | wc -l
0

I tried checking if the url you provide as an example worked (maybe mine was broken), but it fails the same way.

nickirk commented 4 years ago

Looks like immobilienscout24.de has put a restriction on spiders, when I use scrapy to fetch the content, I got a 405 error, meaning method not allowed. I am looking for a way to evade this using scrapy. If you guys have found a way, please comment here.

rodrigodealer commented 4 years ago

Maybe change the user agent when doing the request?

On Thu 20. Aug 2020 at 21:07, Ke notifications@github.com wrote:

Looks like immobilienscout24.de has put a restriction on spiders, when I use scrapy to fetch the content, I got a 405 error, meaning method not allowed. I am looking for a way to evade this using scrapy. If you guys have found a way, please comment here.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nickirk/immo/issues/10#issuecomment-677844577, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDGWAESYCNYOHRIDXUYZTSBVYATANCNFSM4QCGESQA .

-- Rodrigo Oliveira

nickirk commented 4 years ago

I tried simply replace the user agent to a value I found online and it didn't work

krassle commented 3 years ago

@nickirk Have you found a working solution for this issue yet? Thanks.

francisjo commented 3 years ago

@krassle @nickirk , Did you find a solution? Thanks.

nickirk commented 3 years ago

Sorry guys, I have been busy with my thesis and have no time to look into this issue. I encourage you to follow the discussions here and try something by yourselves. I personally recommend using the script on wg-gesucht.de (there is also a minor issue regarding applying the filters on wg-gesucht.de, but other than that, the script should work).

Alnik89 commented 3 years ago

anyone found a solution for this issue? guess if not, then this entire bot is useless and waste of time for someone who is not a programmer.

Alnik89 commented 3 years ago

Tried to fix this issue by proxy rotation. Still not working

jjanczur commented 3 years ago

Hi, I have the same issue. Instead of submitting an offer, I extended your scripts and added functionality to send myself a message on telegram so I could manually check if the apartment is ok.

In the case of your scripts you just need to change submit.py to the following -> instead of submitting send telegram message.

import requests

def submit_app(bot_message):
    bot_token = '<bot token>'
    bot_chatID = '<chat ID>' 
    link = 'https://www.immobilienscout24.de' + \
        bot_message + '%23/basicContact/email'

    send_text = 'https://api.telegram.org/bot' + bot_token + \
        '/sendMessage?chat_id=' + bot_chatID + \
        '&parse_mode=Markdown&text=' + link
    response = requests.get(send_text)
    return response.json()

Unfortunately crawler doesn't work anymore :/

jjanczur commented 3 years ago

Immoscout uses some kind of bot protection and redirect to ReCaptcha :/ I guess that's the end of the automatic apartment finding :p

xabirizar9 commented 2 years ago

Is this still not working? Thought about giving it a try but reading this comments it doesn't look too promising

jjanczur commented 2 years ago

Nope, unfortunately now there is no way to go around it. They heavily protect themselves against webscraping

xabirizar9 commented 2 years ago

Well that's unfortunate, thanks for the quick reply tho

enthusiasmus commented 1 year ago

They are using a certain service for recaptcha against bots, all the used puzzles can be solved with a certain propability programmatically with a lot of effort. The question imo are if the anti captcha logic can be good enough and if there is somebody who wants to invest that time.