whchien / funda-scraper

FundaScaper scrapes data from Funda, the Dutch housing website. You can find listings from house-buying or rental market, and historical data. 🏡
GNU General Public License v3.0
104 stars 48 forks source link

No data scraped (0 houses found in 0 pages) #11

Closed PieterK123 closed 1 year ago

PieterK123 commented 1 year ago

Hi @whchien ,

Many thanks for your efforts and updating the script. I now no longer receive any error when executing the script (see below):

2023-08-06 15:10:35,394 - INFO - *** Phase 1: Fetch all the available links from all pages *** (scrape.py:107) 0%| | 0/276 [00:00<?, ?it/s] 2023-08-06 15:10:35,486 - INFO - *** Got all the urls. 0 houses found in 0 pages. *** (scrape.py:118) 2023-08-06 15:10:35,486 - INFO - *** Phase 2: Start scraping results from individual links *** (scrape.py:183) 0it [00:00, ?it/s] 2023-08-06 15:10:35,493 - INFO - *** All scraping done: 0 results *** (scrape.py:195) 2023-08-06 15:10:35,493 - INFO - *** Cleaning data *** (scrape.py:229) Empty DataFrame Columns: [house_id, city, house_type, building_type, price, price_m2, room, bedroom, bathroom, living_area, energy_label, has_balcony, has_garden, zip, address, year_built, house_age, date_list, ym_list, year_list, descrip] Index: []

My python script looks as follows:

`from funda_scraper import FundaScraper

scraper = FundaScraper(area="amsterdam", want_to="buy", find_past=False, n_pages=3) df = scraper.run(raw_data=False) print(df.head())`

No clue what's wrong (it seems to get the url's?).

whchien commented 1 year ago

Hi @PieterK123

I just released the package with some new updates (funda_scraper==1.0.7). These problems should be solved. Please let me know if these issues persist.

dadadima commented 1 year ago

Thanks for your efforts @whchien.

However, i think the issue still persist in my case. See below a screenshot using the new 1.0.7

image

What am I doing wrong?

whchien commented 1 year ago

Hi @dadadima94

What is the Python version and the version of OS (Windows/Mac) you are working with? Can you try Google Colab and see whether it still returns no results?

PieterK123 commented 1 year ago

Hi @whchien,

Sorry for jumping in (and being a beginner with this), I tried Google Colab and the script runs perfect:

screenshot

When I try to run your exact 'Quickstart' example script with version 1.0.7, the console prints the following error:

2023-08-07 12:03:59,377 - INFO - *** Phase 1: Fetch all the available links from all pages ***  (scrape.py:113)

  0%|          | 0/2 [00:00<?, ?it/s]
  0%|          | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "fscraper.py", line 4, in <module>
    df = scraper.run(raw_data=False)
  File "/home/PKGB/.local/lib/python3.8/site-packages/funda_scraper/scrape.py", line 258, in run
    self.fetch_all_links()
  File "/home/[user directory]/.local/lib/python3.8/site-packages/funda_scraper/scrape.py", line 122, in fetch_all_links
    item_list = self._get_links_from_one_parent(f"{main_url}&search_result={i}")
  File "/home/[user directory]/.local/lib/python3.8/site-packages/funda_scraper/scrape.py", line 86, in _get_links_from_one_parent
    script_tag = soup.find_all("script", {"type": "application/ld+json"})[0]
IndexError: list index out of range

Not sure what I'm doing wrong. Version of OS = Ubuntu 20.04. Version Python = 3.8.10.

*edit 08-08-23: Also tried with running Python 3.10.12 (but no difference).

whchien commented 1 year ago

Hi @PieterK123

This bug is fixed in the latest release (v1.0.10). Please let me know if you encounter any other problems.