neopostmodern / wohnungsbot

Der Wohnungsbot — die Anwendung zum 2. Akt ("Das Versprechen des Bots") des Automatisierungsdramas in drei Akten "Von einem der auszog eine Wohnung in Berlin zu finden" von Clemens Schöll
https://wohnungsbot.de
GNU Affero General Public License v3.0
151 stars 27 forks source link

Not sorted by "newest first" #83

Closed glonjk closed 2 years ago

glonjk commented 2 years ago

Isn't it a Problem that the bot is not sorted by "newest first"? And too bad that it is not going in the focus by itself or at least doing a sound if it stops working. It still does stop in the middle of searching an annoucement or loading the webpage ( For example atm Wohnung 131770273 suchen... appears sind 5 reboots and changing filter parameter. Im now doing a program restart 👍

Well all in all it is the only working optioni found to monitor the website atm so im still happy that it exist :)

neopostmodern commented 2 years ago

It should sort by newest first (see #54, released in 1.4.0). Feel free to re-open if you still experience this issue (along with screenshots or something that show the issue).

I don't understand your other remarks unfortunately, could you please open a new issue for each of the bugs (or feature request) with a detailed and documented problem description?

glonjk commented 2 years ago

jo

neopostmodern commented 2 years ago

So it turns out this is really non-trivial. The GET search parameter &sorting=2 gets automatically discarded when entering it as a URL. When changing sorting through the UI a React click handler gets called image which ... is quite complex and does a POST in the background to update the results. image image

drblaui commented 2 years ago

I have (somewhat) of an idea how to work around this (haven't tried to implement in into the bot, but tried on the browser):

&sorting=2 doesn't really get discarded, rather the whole initial GET Request returns a 301 (Moved Permanently) that "sanitizes" the url. This does in fact remove the sorting param, but also for example changes geocodes. The URL that the 301 returns, is the "final" URL that can be manually extended by the sorting=2 param as seen here: grafik

So basically it would be possible to either let the browser finish loading and then let electron add the sorting param or before even loading a browser, send the GET request and find the 301 URL (will probably cause cookie screens once in a while)

I have tried both methods in my browser (made the GET Request with XMLHttpRequest) and they both worked.

Maybe if you can find out how the 301 geocodes are made (they do seem to have a specific scheme), you can skip having to send a request or changing the URL at runtime all together, since I'm thinking the newer geocode format is what forces the 301

neopostmodern commented 2 years ago

Nice find! Concerning the geocodes I have a funny feeling it's the numbers in this document, minus a constant offset – but we'd have to check.

Could you please post detailed instructions (ideally cURL, I guess) to reproduce the 301? I fail to get it right to look into it myself.

drblaui commented 2 years ago

As far as I can see during testing, a normal cURL will never work as it always goes to the captcha page first (which gives a 405 response code and doesn't change the URL). So the only method I can think of right now is by using the JavaScript Console in your browser after ImmoScout already thinks you are human. Then you can use a XMLHttpRequest to get what you want. For example, sorting flats by new in Mitte and Prenzlauer Berg that have at least 1m² and one Room could go like this

let xhr = new XMLHttpRequest();
xhr.open("GET", "https://www.immobilienscout24.de/Suche/de/berlin/berlin/wohnung-mieten?numberofrooms=1.0-&livingspace=1.0-&pricetype=rentpermonth&price=-1000.0&geocodes=1276003001046,1276003001054", false);
xhr.send();
let newUrl = xhr.responseURL + "&sorting=2";

newURL then holds the right URL.

The problem here also is: This only works on the ImmoScout website. Try to do this from any other website and it will return the captcha page. I believe it has something to do with the cookies set on the page, as any request on it sends these with them: grafik

Theoretically, if the bot sends all of these (or maybe just one is needed) with the request, it could work. But then one could also argue if it's easier to let the bot manually add &sorting=2 to the url after the redirect has happened in the browser