whchien / funda-scraper

FundaScaper scrapes data from Funda, the Dutch housing website. You can find listings from house-buying or rental market, and historical data. 🏡
GNU General Public License v3.0
113 stars 53 forks source link

Fix broken css selectors for issue#43: with NA values in raw_df #50

Open hilmi-egemen-ciritoglu opened 3 months ago

RaymondJ1983 commented 3 months ago

Thanks for the fix!

It works for "find_past=False". But not for "find_past=true". Can you also fix that?

hilmi-egemen-ciritoglu commented 3 months ago

I could be wrong not sure about what is find_past ? I see two different value self.selectors.date_list and listed_since. Checked on README it says


# Get the value according to respective CSS selectors
        if self.to_buy:
            if self.find_past:
                list_since_selector = self.selectors.date_list
            else:
                list_since_selector = self.selectors.listed_since
        else:
            if self.find_past:
                list_since_selector = ".fd-align-items-center:nth-child(9) span"
            else:
                list_since_selector = ".fd-align-items-center:nth-child(7) span"
find historical data; the default is `False`.

not sure what is historical data here ? so I set same value as selector ? Not sure if it is good thing ?

RaymondJ1983 commented 3 months ago

Thanks for the try. It didn't work

What it does, if you select historical data. Then it will scrape all the links that have the status "verkocht", wich means "Sold". It adds &availability=%5B"unavailable in the link. The result is that you will get all the sold properties instad of the available and negotioable ones.

Also in the scrape.py, on lines 302-312 there is something like

Get the value according to respective CSS selectors

    if self.to_buy:
        if self.find_past:
            list_since_selector = self.selectors.date_list
        else:
            list_since_selector = self.selectors.listed_since
    else:
        if self.find_past:
            list_since_selector = ".fd-align-items-center:nth-child(9) span"
        else:
            list_since_selector = ".fd-align-items-center:nth-child(7) span"

Maybe this need to be changed as well. But I have no clue to what.

robskes commented 3 months ago

no data when "find_past=true". anyone have a solution?