whchien / funda-scraper

FundaScaper scrapes data from Funda, the Dutch housing website. You can find listings from house-buying or rental market, and historical data. 🏡
GNU General Public License v3.0
110 stars 51 forks source link

Issue with NA values in raw_df #43

Open MarkMenagie opened 4 months ago

MarkMenagie commented 4 months ago

Since yesterday evening scraper.run() results in an empty dataframe, despite that the code finds and fetches new links. When debugging I notice that self.raw_df is being filled with NA values within the scrape_pages function. It did work before, so did something change on the website or is this just me?

EdwinWenink commented 4 months ago

Was also trying out this library and had a quick look. I still get results if I use the find_past flag, but otherwise not. To me it seems the issue is that Funda changed their CSS, because on new pages I cannot find the CSS selectors specified in config.yaml. I also noticed that on new Funda entries the ?old_ldp query parameter no longer works on new pages (I'm not sure what it means though, is it documented anywhere?). Older entries that are by now sold redirect to koop/verkocht/ pages that still have the old CSS selectors, which would explain the inconsistent results.

EdwinWenink commented 4 months ago

I see a workaround/temporary fix was recently implemented in #41 by @mpgreg that added the ?old_ldp query param, but it seems luck has already run out.

EdwinWenink commented 4 months ago

And the link https://www.funda.nl/huur/amsterdam/appartement-43547656-jan-van-zutphenstraat-75/?old_ldp=true redirects to https://www.funda.nl/detail/huur/amsterdam/appartement-jan-van-zutphenstraat-75/43547656/ which doesn't have the old CSS selectors.

RaymondJ1983 commented 3 months ago

When will this be solved? Im missing a lot of data now.

hilmi-egemen-ciritoglu commented 2 months ago

I made a patch here, feel free to pull: https://github.com/whchien/funda-scraper/pull/50

xqzy commented 2 months ago

Many thanks for creating this patch! For me the patch doesn't seem to work. Not sure whether I correctly applied it: I pulled the code from the repo and only changed the config.yaml file. There is a difference in that the scraper now does recognise the selling price, but the other fields are not captured. Any ideas what could cause that? Perhaps the funda CSS has changed again?