Closed PieterK123 closed 1 year ago
Hi @PieterK123
I just released the package with some new updates (funda_scraper==1.0.7). These problems should be solved. Please let me know if these issues persist.
Thanks for your efforts @whchien.
However, i think the issue still persist in my case. See below a screenshot using the new 1.0.7
What am I doing wrong?
Hi @dadadima94
What is the Python version and the version of OS (Windows/Mac) you are working with? Can you try Google Colab and see whether it still returns no results?
Hi @whchien,
Sorry for jumping in (and being a beginner with this), I tried Google Colab and the script runs perfect:
When I try to run your exact 'Quickstart' example script with version 1.0.7, the console prints the following error:
[38;20m2023-08-07 12:03:59,377 - INFO - *** Phase 1: Fetch all the available links from all pages *** (scrape.py:113)[0m
0%| | 0/2 [00:00<?, ?it/s]
0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "fscraper.py", line 4, in <module>
df = scraper.run(raw_data=False)
File "/home/PKGB/.local/lib/python3.8/site-packages/funda_scraper/scrape.py", line 258, in run
self.fetch_all_links()
File "/home/[user directory]/.local/lib/python3.8/site-packages/funda_scraper/scrape.py", line 122, in fetch_all_links
item_list = self._get_links_from_one_parent(f"{main_url}&search_result={i}")
File "/home/[user directory]/.local/lib/python3.8/site-packages/funda_scraper/scrape.py", line 86, in _get_links_from_one_parent
script_tag = soup.find_all("script", {"type": "application/ld+json"})[0]
IndexError: list index out of range
Not sure what I'm doing wrong. Version of OS = Ubuntu 20.04. Version Python = 3.8.10.
*edit 08-08-23: Also tried with running Python 3.10.12 (but no difference).
Hi @PieterK123
This bug is fixed in the latest release (v1.0.10). Please let me know if you encounter any other problems.
Hi @whchien ,
Many thanks for your efforts and updating the script. I now no longer receive any error when executing the script (see below):
2023-08-06 15:10:35,394 - INFO - *** Phase 1: Fetch all the available links from all pages *** (scrape.py:107) 0%| | 0/276 [00:00<?, ?it/s] 2023-08-06 15:10:35,486 - INFO - *** Got all the urls. 0 houses found in 0 pages. *** (scrape.py:118) 2023-08-06 15:10:35,486 - INFO - *** Phase 2: Start scraping results from individual links *** (scrape.py:183) 0it [00:00, ?it/s] 2023-08-06 15:10:35,493 - INFO - *** All scraping done: 0 results *** (scrape.py:195) 2023-08-06 15:10:35,493 - INFO - *** Cleaning data *** (scrape.py:229) Empty DataFrame Columns: [house_id, city, house_type, building_type, price, price_m2, room, bedroom, bathroom, living_area, energy_label, has_balcony, has_garden, zip, address, year_built, house_age, date_list, ym_list, year_list, descrip] Index: []
My python script looks as follows:
`from funda_scraper import FundaScraper
scraper = FundaScraper(area="amsterdam", want_to="buy", find_past=False, n_pages=3) df = scraper.run(raw_data=False) print(df.head())`
No clue what's wrong (it seems to get the url's?).