whchien / funda-scraper

FundaScaper scrapes data from Funda, the Dutch housing website. You can find listings from house-buying or rental market, and historical data. 🏡
GNU General Public License v3.0
104 stars 48 forks source link

Fetched dataframe without any records #47

Open geertoff opened 2 months ago

geertoff commented 2 months ago

Hi,

I've tried running the scraper using the script listed in the documentation on Windows, however I get a dataframe with no records... I've cloned the repository, installed a conda environment using the requirements.txt. and updated urrlib3 after I got the ModuleNotfoundError. It looks like it does extract information from the individual links, but not saving it to the dataframe. See console output below.

*** Phase 1: Fetch all the available links from all pages *** 
*** Main URL: https://www.funda.nl/en/zoeken/huur?selected_area=%5B%22amsterdam%22%5D&price=%22500-2000%22 ***
100%|██████████| 3/3 [00:02<00:00,  1.45it/s]
*** Got all the urls. 45 houses found from 1 to 3 ***
*** Phase 2: Start scraping from individual links ***
100%|██████████| 45/45 [00:17<00:00,  2.51it/s]
*** All scraping done: 45 results ***
*** Cleaning data ***
*** File saved: test2.csv. ***
*** Done! ***

Script that is run:

from funda_scraper import FundaScraper

if __name__ == '__main__' : 
    scraper = FundaScraper(
        area="amsterdam", 
        want_to="rent", 
        find_past=False, 
        page_start=1, 
        n_pages=3, 
        min_price=500, 
        max_price=2000
    )
    df = scraper.run(raw_data=False, save=True, filepath="test2.csv")
    df.head()

Is there something I forgot?

zencodess commented 1 month ago

Hey! Try changing the argument raw_data to df = scraper.run(raw_data=True, save=True, filepath="test2.csv") That returned a non-empty data frame and worked for me.

AndrewFBel commented 1 month ago

Hi @zencodess, this doesn't work, unfortunately. Everything is NAs, except date_list, city, and log_id

m-testers commented 1 month ago

I've got the same problem. Is there already a sollution?

When i use the following script for sold houses it works perfect, but for houses which are for sale it doesn't work, then i get an empty dataframe.

from funda_scraper import FundaScraper

if name == "main": scraper = FundaScraper( area="gemeente (naam gemeente)", want_to="buy", find_past=True, page_start=1, n_pages=10, ) df = scraper.run(raw_data=False, save=True, filepath="test.csv") print(df.head())

mennohie commented 1 month ago

Seems like the CSS selectors have been changed/obfuscated on the funda website...

AndrewFBel commented 1 month ago

They started protecting from scraping them via requests. I'm not sure what can be a solution here rather than something like Selenium but that would be much longer.

mennohie commented 1 month ago

They started protecting from scraping them via requests. I'm not sure what can be a solution here rather than something like Selenium but that would be much longer.

https://github.com/whchien/funda-scraper/issues/40 issue was solved in the past. But these pre-beta pages are not available anymore. CSS has just been changed.

I don't think Selenium would be necessary, we would just need more complicated HTML/CSS selectors.