tldev-de / immopushr

Telegram Bot that finds offers on immobilienscout24.de, immowelt.de and ebay-kleinanzeigen.de.
GNU General Public License v3.0
30 stars 1 forks source link
hacktoberfest

Telegram Bot that finds flat offers on immobilienscout,de, immowelt.de and ebay-kleinanzeigen.de.

Features | How to use | Issues | Contact

Table of Contents

Features

ImmoPushr is a web scraper for immobilienscout24.de, immowelt.de and ebay-kleinanzeigen.de. In germany, these three sites are often used to offer property or rental flats and houses. ImmoPushr helps to find a flat or house on these platforms by periodically scraping the configured search websites. Once a new offer is found, ImmoPushr notifies you via Telegram.

ImmoPushr saves you time and gives you a small time advantage :)

ImmoPushr is...

How is it working?

ImmoPushr is implemented as small PHP script, which is triggered by crond inside a docker container. It uses Selenium to visit the configured websites and extracts the relevant information (title, price, size, amount of rooms, location and link). If it detects a captcha, it uses 2captcha to solve the captcha automatically. You can specify, how often ImmoPushr should check for new offers on the configured sites. By default, it runs every hour between 7 and 22 o'clock at x:50.

How to use?

The simplest way to use ImmoPushr is using the prebuilt docker images from Dockerhub (tldevde/immopushr) with docker-compose.

You can also run it without docker, which is currently not documented.

You can use the docker-compose.yml file from the repository. Just copy the file and change the environment variables of the immopushr container to fit your needs. The configuration options are described below. Then you can start it using docker-compose up -d. Run docker-compose logs to see the logs if you experience any problems.

Configuration Options

You can specify configuration options either via a .env file in the applications directory or via environment variables (preferred, especially if you use docker). Here is an overview of all options:

Environment Variable Mandatory Default Value Example Value
CAPTCHA_TOKEN no - 098f6bcd4621d373cade4e832627b4f6
TELEGRAM_BOT_TOKEN yes - 12123121:AaBbCcDdEeFf00112233445566778899Ggh
TELEGRAM_CHATS yes - 123123\|1231234
CRON_PATTERN no 50 7-22 * * * 20,50 7-22 * * *
SELENIUM_URL no http://selenium:4444 http://selenium:4444
URLS yes - https://www.immobilienscout24.de/Suche/radius/wohnung-mieten?centerofsearchaddress=Berlin;;;;;&numberofrooms=1.5-&price=-1000.0&pricetype=rentpermonth&geocoordinates=52.51051;13.43068;4.0&sorting=2

CAPTCHA_TOKEN

Some sites use captchas (e.g. Google reCAPTCHA) to block evil web scrapers. Since this bot is not evil, you can automate solving these captchas using 2captcha. Just register at their website and add funds. 2captcha costs about 3 $ per 1000 captchas! You can find your API token on their customer panel.

If you don't want to solve captchas automatically, ImmoPushr will skip the search site if it detects a captcha. As of February 2020, Immobilienscout24.de won't be crawlable without a valid 2captcha token.

TELEGRAM_BOT_TOKEN / TELEGRAM_CHATS

ImmoPushr needs a registered Telegram bot to send messages to you. You can register a new bot with the Botfather. Just create a new chat with Botfather and send the command /newbot as message. The Botfather will ask for a name of the new bot and send you a bot token (e.g. 12123121:AaBbCcDdEeFf00112233445566778899Ggh) to access the HTTP API. This token needs to be provided in the environment variable TELEGRAM_BOT_TOKEN.

To get your chat id, you need to send a message to the newly registered bot. After that you can use the following bash command to get the chat id:

$ curl https://api.telegram.org/bot[TELEGRAM_BOT_TOKEN]/getUpdates

The result will look like this:

{"ok":true,"result":[{"update_id":123123123,
"message":{"message_id":123,"from":{"id":XXXXXXXX,"is_bot":false,"first_name":"YOUR_NAME","language_code":"en"},"chat":{"id":XXXXXXXX,"first_name":"YOUR_NAME","type":"private"},"date":1231231231,"text":"XYZ"}}]}

The relevant chat id in this example is XXXXXXXX. This chat id needs to be provided in the environment variable TELEGRAM_CHATS. You can enter multiple chat ids separated by |.

CRON_PATTERN

ImmoPushr runs periodically triggered by crond. You can change the default behaviour by providing your own cron expression. Please make sure to stay within limits. Do not run the crawler too often! This produces high load on the platforms and will result in banned ip addresses. If you are not experienced with cron expressions you should have a look at Crontab Guru for further information.

SELENIUM_URL

Selenium is a framework for automated browser tests. Nevertheless, it can also be used to automate tasks and crawl websites. To do so, it remote controls modern browsers - in our case it uses a Chrome browser.

In this environment variable you need to provide a link to a selenium grid or standalone instance. The simplest choice to run a standalone instance is to use the docker container, which is used in our docker-compose file as well.

URLS

Simply add you search URLs seperated by |. A search URL may look like this: https://www.immobilienscout24.de/Suche/radius/wohnung-mieten?centerofsearchaddress=Berlin;;;;;&numberofrooms=1.5-&price=-1000.0&pricetype=rentpermonth&geocoordinates=52.51051;13.43068;4.0&sorting=2. Since only the first page will be crawled, you should order the results by creation date.

Development

You can run the script locally with php-cli: php cron.php. Before starting the script, you need to provide the environment variables in a .env file. You can start a local selenium instance using the following command:

docker run -d --name selenium -e SCREEN_WIDTH=1920 -e SCREEN_HEIGHT=1080 -p 127.0.0.1:4444:4444 -p 127.0.0.1:5900:5900 -v /dev/shm:/dev/shm selenium/standalone-chrome:4

Props

Props to Flathunter, which I actually wanted to use. Since their crawler was not working for me, I decided to build my own alternative using selenium. Nevertheless, it's a great project, and if you are a python developer you should consider supporting it.

Contributing

I'm really happy to accept contributions from the community, that’s the main reason why I open-sourced it! There are many ways to contribute, even if you’re not a technical person. You can report issues directly on Github. Please document as much as possible the steps to reproduce your problem (even better with logs and your configuration).

License

GPL-v3 @ Tobias Dillig